Crash Data

As vehicle owners, the safety of our passengers and ourselves is always something we’re considering when on the road. There are many pieces of common-sense wisdom told to drivers when they’re learning, and state DOL exams and driver’s ed classes supply even more data-based information on safe driving practices. Yet, it’s been years since either of us have taken a driving course or licensing test, and the majority of our safe-driving skills and knowledge are now based on habit and inuition. As aspiring data scientists, we understand the while our habits may be good, our decision making can still be improved by what the data has to tell us. A large body of knowledge that contains information on all car crashes in our local area and the conclusions that can be drawn from that information would be the perfect solution to augment our driving intuition with hard data.

Fortunately, we may have just that. Our 2018 Portland Car Crashes dataset (portland_crashes_2018) contains information on all individuals within, or on, a vehicle involved in a motor vehicle accident, as well as crash details for each individual accident. All vehicle and participant information comes from the Oregon Department of Transportation (ODOT, and is curated from an already existent data package titled pdx_crash_2018.

What We Did:

The original data set was composed of multiple spreadsheets littered with columns that were devoid of any useful information. We merged the most relevant spreadsheets into one, and removed columns that were either entirely blank, duplicates, or contained too little information to draw meaningful conclusions from. We also changed many column names to more intuitively convey the information they record. Additionally the ODOT manual that explains the way variable values are represented in the original pdx_crash_2018 data package is quite long and at many times confusing. We created a help file for our data package that concisely describes the variables in our new data set and the values they take on.

What You Can Do:

It’s our hope that you use this dataset to better inform yourself and others about the factors that are associated with crashes. How might you do this? Well, for example, suppose you were curious about the relationship between crash type and injury severity. With the data included in our portland_crashes_2018 data package, you could compare the proportions of accidents where vehicle occupants are wearing seatbelts with those where they are not. Doing so would allow you to determine whether fatal accidents occur more often with or without a seatbelt.

What You Can See:

For another example, we wanted to see under which lighting conditions you are most likely to collide with a fixed object like a light post or fence. The majority of crashes occur in the day, so it is not fair to simply count the number of collisions with fixed objects by time of day. Instead, the graph below shows the proportion of accidents that were caused by a collision with a fixed object, for each type of lighting (night, dawn, day, etc.). We can clearly see a relationship between visibility and fixed-object crashes. Using color to illustrate the number of crashes, we can still see that the most fixed-object crashes occur in the day, but the highest proportion occurs at night.

Despite the informative nature of data, we have to remind ourselves of its limitations. Data is rarely, if ever, complete; any measurement is subject to measurement error, and consequently, data-based decisions are too. While our portland_crashes_2018 data package will be useful in drawing conclusions about associations between crashes and contributing factors, it’s important to remember that it only records documented crashes, and many crashes may go unnoticed or unreported. It is also important to be careful about mistaking correlation for causality. For instance, in the plot above, we see a relationship between less light and more fixed-object crashes, but we do not know if this is caused by lack of visibility, or if there are simply less cars to run into, causing the night time proportion to inflate.