library(tidyverse) library(tidycensus) library(tigris) # for loading basemap library(broom) # for working with the regression library(knitr) # for kable library(kableExtra) # so that the tables look less awful. library(tufte) # so that we can celebrate how amazing Edward Tufte is and how great his books look
How much does education levels influence support for Trump? How likely is a senator to vote in favor of Trump considering 2016 election results and the education levels in their state? To answer these question we used FiveThirtyEight’s Tracking Trump In Congress dataset and the U.S. Census Bureau Educational Attainment Survey.
FiveThirtyEight has created three different scores to indicate how senators vote with respect to Trump. The
Trump Score represents the percentage of how often a senator votes yes or no and agrees with Trump. The
Predicted Score represents how often we would expect a senator agree with Trump based on 2016 election results. The
Trump Plus-Minus is derived from the
Trump Score minus the
Predicted Score. More info on this can be found on FiveThirtyEight here.
Using these scores we will be able to determine the probability of senators voting with the President compared to their constituents’ education levels. We will be analyzing the combined 115th and 116th Congress during the time Donald Trump has been in office.
Data Preparation and Cleaning
We downloaded FiveThirtyEight’s data from their GitHub repo, and loaded it into R with the function
read_csv. The data consists of two tables,
vote_predictions in which an observation is a representative’s vote, and
averages, in which an observation is a representative in a particular session. For this project, we will be using the
averages table, as we are interested in the senators’ voting patterns, rather than specific votes.
We then filtered observations to only include Senators from the most recent Congress.
averages <- read_csv("/Users/mcconvil/Desktop/Cumulus/Courses/Spring 2020/math241s20/Projects/MiniProject2/math241S20PostGrp9/averages.csv") %>% filter(congress == 0, chamber == "senate") %>% # drop: # chamber because it is "senate" for all observations # district because senators are elected on the state level # congress because it is "0" for all observations select(-chamber, -district, -congress) %>% mutate(party = as_factor(party), state = as_factor(state))
We used the R Package
tidycensus to grab education data from the US Census Bureau.
tidycensus helps R users download data from the census without having to learn how to use the census API.
education <- get_acs("State", table = "C15003", output = "wide", survey = "acs1", key = api_key)
After that, we manipulate the columns to create useful, human-readable variables. These are:
- % Less than High School
- % High School
- % GED
- % Some college
- % College
- % Graduate education
We then aggregate these variables into broader categories:
- % highschool or less
- % bachelors or more
With these aggregated variables, we will be able to determine what impact education levels have on a state’s support for Trump and the voting patterns of senators from that state.
education <- education %>% rename(population = C15003_001E) %>% # create the small education bins mutate(education_small_bin_percent_less_than_highschool = (C15003_002E + C15003_003E + C15003_004E + C15003_005E + C15003_006E + C15003_007E + C15003_008E + C15003_009E) / population, education_small_bin_percent_highschool = C15003_010E / population, education_small_bin_percent_ged = C15003_011E / population, education_small_bin_percent_some_college = (C15003_012E + C15003_013E + C15003_014E) / population, education_small_bin_percent_college = C15003_015E / population, education_small_bin_percent_graduate = (C15003_016E + C15003_017E + C15003_018E) / population) %>% # add up the small bins into the bins we'll use mutate(education_high_school_or_less = education_small_bin_percent_ged + education_small_bin_percent_highschool + education_small_bin_percent_less_than_highschool, education_bachelors_or_more = education_small_bin_percent_college + education_small_bin_percent_graduate) %>% # rename the variable to make future merging easier rename(state = NAME) %>% # drop the variables we no longer need (the raw estimates and margins of error) select(GEOID, state, population, starts_with("education"))
In order to draw maps, we will require a base map. A base map is a blank map that data can be mapped onto. We can download one from the
tigris package with the following code:
# download state geometry files base_map <- states(cb = TRUE, class = "sf", progress_bar = FALSE) %>% select(geometry, state = STUSPS, NAME) theme_tufte <- theme(panel.background = element_rect("#fffff8", "#fffff8"), plot.background = element_rect("#fffff8", "#fffff8"))
We then aggregate the voting data by state, use joins to add in a base map and education data, and drop Alaska and Hawaii to simplify map-drawing.
df <- averages %>% group_by(state) %>% summarize(trump_vote = mean(net_trump_vote), predicted_agree = mean(predicted_agree), actual_agree = mean(agree_pct)) %>% left_join(base_map, by = c("state" = "state")) %>% rename(state = NAME, state_code = state) %>% left_join(education, by = c("state" = "state")) %>% filter(!(state %in% c("Hawaii", "Alaska"))) %>% # exclude Alaska and Hawaii because they don't fit on the map select(-GEOID) # we won't need it
## Warning: Column `state` joining factor and character vector, coercing into ## character vector
Where did Trump do well in 2016?
Now that we’ve cleaned and prepared our data, we’re ready to draw some maps.
ggplot2 can draw maps from
geom_sf. Let’s start by drawing a map of Trump’s support in the 2016 election.
ggplot(df, aes(fill = trump_vote, geometry = geometry)) + geom_sf() + theme_map + xlim(-125, -68) + ylim(23, 50) + scale_fill_distiller(type = "div", palette = "RdBu") + labs(fill = "Margin (%)", title = "Trump's Share of the 2016 Election Votes Minus Clinton's") + theme_tufte
Based off of the map, it is apparent that Trump has significantly more support in the middle of the country than either coast. Wyoming and California stand out as the most polarized.
Does having a bachelor’s influence support for Trump?
Now let’s test whether or not having a bachelor’s degree or more influences support for Trump. We can do this by running a regression, which tests whether or not an explanatory variable affects an outcome. We can then grab the results of the regression with the function
tidy from the
broom package, which helps R users work with regressions. As shown by the table below, states where a higher percentage of the population has a bachelor’s degree or more cast relatively fewer votes for Donald Trump in 2016.
tidy(lm(trump_vote ~ education_bachelors_or_more, df)) %>% # drop the intercept term, as we are only interested in the direction of the effect filter(term != "(Intercept)") %>% select(term, estimate) %>% # make the explanatory variable name nice and human readable mutate(term = if_else(term == "education_bachelors_or_more", "Having a bachelor's degree or more", term)) %>% kable(col.names = c("Explanatory Variable", "Effect on Voting for Trump")) %>% kable_styling(position = "float_left")
|Explanatory Variable||Effect on Voting for Trump|
|Having a bachelor’s degree or more||-269.9827|
Now let’s look at which states bucked this trend. Regressions produce residuals, meaning the difference between what the explanatory variable predicts and what actually happened. We can extract the residuals using the base R function
resid and attach them to the dataframe using a
# confirmed that this works: https://stackoverflow.com/questions/20506984/are-residuals-in-linear-regression-following-same-order-of-original-data-frame-r df <- df %>% mutate(college_or_more_support_residual = resid(lm(trump_vote ~ education_bachelors_or_more, df)))
Now that we’ve made the residuals a column in the dataframe, we can draw a map of them. This allows us to see which states voted for Trump more or less than their education levels would predict.
ggplot(df, aes(fill = college_or_more_support_residual, geometry = geometry)) + geom_sf() + theme_map + xlim(-125, -68) + ylim(23, 50) + scale_fill_distiller(type = "div", palette = "RdBu") + labs(title = "Which states voted for Trump more than their education levels predicted?", fill = "Positive is More") + theme_tufte
From the above map, we can see that certain states, such as Wyoming, voted for Trump at much higher levels than expected, whereas Trump underperformed in Nevada, California, and New Mexico relative to those states’ education levels. Let’s examine these observations (and Ohio as a control) to see what is going on. To do so, we’ll subset the data using a
df %>% filter(state %in% c("Wyoming", "Ohio", "California", "Nevada", "New Mexico")) %>% mutate(education_bachelors_or_more = education_bachelors_or_more * 100) %>% select(state, trump_vote, education_bachelors_or_more, college_or_more_support_residual) %>% kable(col.names = c("State", "Trump Vote Share", "Bachelors or More %", "Vote Relative to Education")) %>% kable_styling(position = "float_left")
|State||Trump Vote Share||Bachelors or More %||Vote Relative to Education|
Looking at the table, it appears that Wyoming is an outlier simply because it voted overwhelmingly for Trump. This is likely due to its extremely rural, conservative nature. The opposite is true of California. In contrast, both Nevada and New Mexico simply voted less for Trump than their education levels would predict.
Do senators vote differently depending on their constituents’ education levels?
ggplot(df, aes(fill = actual_agree, geometry = geometry)) + geom_sf() + theme_map + xlim(-125, -68) + ylim(23, 50) + scale_fill_distiller(type = "div", palette = "RdBu") + labs(title = "Which senators vote with Trump?", fill = "% of the Time") + theme_tufte
Now let’s examine whether or not a senator’s constituents’ education level affects their voting pattern. First, we’ll run a regression to see how it affects voting patterns. As shown by the table below, senators with a more highly educated constituency vote with Trump less often.
tidy(lm(actual_agree ~ education_bachelors_or_more, df)) %>% filter(term != "(Intercept)") %>% select(term, estimate) %>% mutate(term = if_else(term == "education_bachelors_or_more", "Having a bachelor's degree or more", term)) %>% kable(col.names = c("Explanatory Variable", "Effect on Senator Agreeing with Trump")) %>% kable_styling(position = "float_left")
|Explanatory Variable||Effect on Senator Agreeing with Trump|
|Having a bachelor’s degree or more||-3.758465|
After adjusting for education, the map looks like:
df <- df %>% mutate(college_or_more_senator_residual = resid(lm(actual_agree ~ education_bachelors_or_more, df))) ggplot(df, aes(fill = college_or_more_senator_residual, geometry = geometry)) + geom_sf() + theme_map + xlim(-125, -68) + ylim(23, 50) + scale_fill_distiller(type = "div", palette = "RdBu") + labs(title = "Educated-adjusted Senator Voting Patterns", fill = "Voting with Trump") + theme_tufte
After adjusting for education, there are some major changes in senators’ voting patterns. The partisan lean of most of the south is significantly reduced, suggesting that a lack of education among their constituents explains why southern senators vote with Trump. Georgia and North Carolina buck this trend. Anecdotally, in North Carolina this is likely due to the highly educated members of the population being concentrated in the Raleigh-Durham-Chapel Hill metro area. Interestingly, senators from Utah, Colorado, Kansas, and Nebraska all vote with Trump much more often than education would predict. we suspect this also has to do with geographic clustering of the highly educated members of their populations. Finally, senators from New Mexico vote for Trump far less than education would predict.
While education did a fairly good job of explaining voting for Trump at both the constituent and senator level in some states, it did not in others. Geographic distribution of the highly educated part of the population explains this disparity in a few states, such as North Carolina. In the remaining states, other factors must be at play. Scholarly work on the other predictors of support for Donald Trump can be read here, here, or here.