Actual vs Predicted Senate Vote compared to Education Levels across U.S. States

library(tidyverse)
library(tidycensus)
library(tigris) # for loading basemap
library(broom) # for working with the regression
library(knitr) # for kable
library(kableExtra) # so that the tables look less awful.
library(tufte) # so that we can celebrate how amazing Edward Tufte is and how great his books look

Introduction

How much does education levels influence support for Trump? How likely is a senator to vote in favor of Trump considering 2016 election results and the education levels in their state? To answer these question we used FiveThirtyEight’s Tracking Trump In Congress dataset and the U.S. Census Bureau Educational Attainment Survey.

FiveThirtyEight has created three different scores to indicate how senators vote with respect to Trump. The Trump Score represents the percentage of how often a senator votes yes or no and agrees with Trump. The Predicted Score represents how often we would expect a senator agree with Trump based on 2016 election results. The Trump Plus-Minus is derived from the Trump Score minus the Predicted Score. More info on this can be found on FiveThirtyEight here.

Using these scores we will be able to determine the probability of senators voting with the President compared to their constituents’ education levels. We will be analyzing the combined 115th and 116th Congress during the time Donald Trump has been in office.

Data Preparation and Cleaning

We downloaded FiveThirtyEight’s data from their GitHub repo, and loaded it into R with the function read_csv. The data consists of two tables, vote_predictions in which an observation is a representative’s vote, and averages, in which an observation is a representative in a particular session. For this project, we will be using the averages table, as we are interested in the senators’ voting patterns, rather than specific votes.

We then filtered observations to only include Senators from the most recent Congress.

averages <- read_csv("/Users/mcconvil/Desktop/Cumulus/Courses/Spring 2020/math241s20/Projects/MiniProject2/math241S20PostGrp9/averages.csv") %>%
  filter(congress == 0,
         chamber == "senate") %>%
  # drop:
  # chamber because it is "senate" for all observations
  # district because senators are elected on the state level
  # congress because it is "0" for all observations
  select(-chamber, -district, -congress) %>%
  mutate(party = as_factor(party),
         state = as_factor(state))

We used the R Package tidycensus to grab education data from the US Census Bureau. tidycensus helps R users download data from the census without having to learn how to use the census API.

education <- get_acs("State",
                     table = "C15003",
                     output = "wide",
                     survey = "acs1",
                     key = api_key)

After that, we manipulate the columns to create useful, human-readable variables. These are:

% Less than High School
% High School
% GED
% Some college
% College
% Graduate education

We then aggregate these variables into broader categories:

% highschool or less
% bachelors or more

With these aggregated variables, we will be able to determine what impact education levels have on a state’s support for Trump and the voting patterns of senators from that state.

education <- education %>%
 rename(population = C15003_001E) %>%
  # create the small education bins
  mutate(education_small_bin_percent_less_than_highschool = (C15003_002E + C15003_003E + C15003_004E + C15003_005E + C15003_006E + C15003_007E + C15003_008E + C15003_009E) / population,
         education_small_bin_percent_highschool = C15003_010E / population,
         education_small_bin_percent_ged = C15003_011E / population,
         education_small_bin_percent_some_college = (C15003_012E + C15003_013E + C15003_014E) / population,
         education_small_bin_percent_college = C15003_015E / population,
         education_small_bin_percent_graduate = (C15003_016E + C15003_017E + C15003_018E) / population) %>%
  # add up the small bins into the bins we'll use
  mutate(education_high_school_or_less = education_small_bin_percent_ged + education_small_bin_percent_highschool + education_small_bin_percent_less_than_highschool,
         education_bachelors_or_more = education_small_bin_percent_college + education_small_bin_percent_graduate) %>%
  # rename the variable to make future merging easier
  rename(state = NAME) %>%
  # drop the variables we no longer need (the raw estimates and margins of error)
  select(GEOID, state, population, starts_with("education"))

In order to draw maps, we will require a base map. A base map is a blank map that data can be mapped onto. We can download one from the tigris package with the following code:

# download state geometry files
base_map <- states(cb = TRUE, class = "sf", progress_bar = FALSE) %>%
  select(geometry, state = STUSPS, NAME)

theme_tufte <- theme(panel.background = element_rect("#fffff8", "#fffff8"),
                     plot.background = element_rect("#fffff8", "#fffff8"))

We then aggregate the voting data by state, use joins to add in a base map and education data, and drop Alaska and Hawaii to simplify map-drawing.

df <- averages %>%
  group_by(state) %>%
  summarize(trump_vote = mean(net_trump_vote),
            predicted_agree = mean(predicted_agree),
            actual_agree = mean(agree_pct)) %>%
  left_join(base_map, by = c("state" = "state")) %>%
  rename(state = NAME, state_code = state) %>%
  left_join(education, by = c("state" = "state")) %>%
  filter(!(state %in% c("Hawaii", "Alaska"))) %>% # exclude Alaska and Hawaii because they don't fit on the map
  select(-GEOID) # we won't need it

## Warning: Column `state` joining factor and character vector, coercing into
## character vector

Where did Trump do well in 2016?

Now that we’ve cleaned and prepared our data, we’re ready to draw some maps. ggplot2 can draw maps from tigris or tidycensus using geom_sf. Let’s start by drawing a map of Trump’s support in the 2016 election.

ggplot(df, aes(fill = trump_vote, geometry = geometry)) +
  geom_sf() +
  theme_map +
  xlim(-125, -68) +
  ylim(23, 50) +
  scale_fill_distiller(type = "div", palette = "RdBu") +
  labs(fill = "Margin (%)",
       title = "Trump's Share of the 2016 Election Votes Minus Clinton's") +
  theme_tufte

Based off of the map, it is apparent that Trump has significantly more support in the middle of the country than either coast. Wyoming and California stand out as the most polarized.

Does having a bachelor’s influence support for Trump?

Now let’s test whether or not having a bachelor’s degree or more influences support for Trump. We can do this by running a regression, which tests whether or not an explanatory variable affects an outcome. We can then grab the results of the regression with the function tidy from the broom package, which helps R users work with regressions. As shown by the table below, states where a higher percentage of the population has a bachelor’s degree or more cast relatively fewer votes for Donald Trump in 2016.

tidy(lm(trump_vote ~ education_bachelors_or_more, df)) %>%
  # drop the intercept term, as we are only interested in the direction of the effect
  filter(term != "(Intercept)") %>%
  select(term, estimate) %>%
  # make the explanatory variable name nice and human readable
  mutate(term = if_else(term == "education_bachelors_or_more", 
                        "Having a bachelor's degree or more",
                        term)) %>%
  kable(col.names = c("Explanatory Variable", "Effect on Voting for Trump")) %>%
  kable_styling(position = "float_left")

Explanatory Variable	Effect on Voting for Trump
Having a bachelor’s degree or more	-269.9827

Now let’s look at which states bucked this trend. Regressions produce residuals, meaning the difference between what the explanatory variable predicts and what actually happened. We can extract the residuals using the base R function resid and attach them to the dataframe using a mutate call.

# confirmed that this works: https://stackoverflow.com/questions/20506984/are-residuals-in-linear-regression-following-same-order-of-original-data-frame-r
df <- df %>%
  mutate(college_or_more_support_residual = resid(lm(trump_vote ~ education_bachelors_or_more, df)))

Now that we’ve made the residuals a column in the dataframe, we can draw a map of them. This allows us to see which states voted for Trump more or less than their education levels would predict.

ggplot(df, aes(fill = college_or_more_support_residual, geometry = geometry)) +
  geom_sf() +
  theme_map +
  xlim(-125, -68) +
  ylim(23, 50) +
  scale_fill_distiller(type = "div", palette = "RdBu") +
  labs(title = "Which states voted for Trump more than their education levels predicted?",
       fill = "Positive is More") +
  theme_tufte

From the above map, we can see that certain states, such as Wyoming, voted for Trump at much higher levels than expected, whereas Trump underperformed in Nevada, California, and New Mexico relative to those states’ education levels. Let’s examine these observations (and Ohio as a control) to see what is going on. To do so, we’ll subset the data using a filter.

df %>%
  filter(state %in% c("Wyoming", "Ohio", "California", "Nevada", "New Mexico")) %>%
  mutate(education_bachelors_or_more = education_bachelors_or_more * 100) %>%
  select(state, trump_vote, education_bachelors_or_more, college_or_more_support_residual) %>%
  kable(col.names = c("State", "Trump Vote Share", "Bachelors or More %", "Vote Relative to Education")) %>%
  kable_styling(position = "float_left")

State	Trump Vote Share	Bachelors or More %	Vote Relative to Education
Ohio	8.129574	28.97222	-5.009677
Wyoming	46.295276	26.85661	27.444249
Nevada	-2.417128	24.88984	-26.578082
California	-30.109293	34.20047	-29.133167
New Mexico	-8.213268	27.65388	-24.911794

Looking at the table, it appears that Wyoming is an outlier simply because it voted overwhelmingly for Trump. This is likely due to its extremely rural, conservative nature. The opposite is true of California. In contrast, both Nevada and New Mexico simply voted less for Trump than their education levels would predict.

Do senators vote differently depending on their constituents’ education levels?

ggplot(df, aes(fill = actual_agree, geometry = geometry)) +
  geom_sf() +
  theme_map +
  xlim(-125, -68) +
  ylim(23, 50) +
  scale_fill_distiller(type = "div", palette = "RdBu") +
  labs(title = "Which senators vote with Trump?",
       fill = "% of the Time") +
  theme_tufte

Now let’s examine whether or not a senator’s constituents’ education level affects their voting pattern. First, we’ll run a regression to see how it affects voting patterns. As shown by the table below, senators with a more highly educated constituency vote with Trump less often.

tidy(lm(actual_agree ~ education_bachelors_or_more, df)) %>%
  filter(term != "(Intercept)") %>%
  select(term, estimate) %>%
  mutate(term = if_else(term == "education_bachelors_or_more",
                        "Having a bachelor's degree or more",
                        term)) %>%
  kable(col.names = c("Explanatory Variable", "Effect on Senator Agreeing with Trump")) %>%
  kable_styling(position = "float_left")

Explanatory Variable	Effect on Senator Agreeing with Trump
Having a bachelor’s degree or more	-3.758465

After adjusting for education, the map looks like:

df <- df %>%
  mutate(college_or_more_senator_residual = resid(lm(actual_agree ~ education_bachelors_or_more, df)))
ggplot(df, aes(fill = college_or_more_senator_residual, geometry = geometry)) +
  geom_sf() +
  theme_map +
  xlim(-125, -68) +
  ylim(23, 50) +
  scale_fill_distiller(type = "div", palette = "RdBu") +
  labs(title = "Educated-adjusted Senator Voting Patterns",
       fill = "Voting with Trump") +
  theme_tufte

After adjusting for education, there are some major changes in senators’ voting patterns. The partisan lean of most of the south is significantly reduced, suggesting that a lack of education among their constituents explains why southern senators vote with Trump. Georgia and North Carolina buck this trend. Anecdotally, in North Carolina this is likely due to the highly educated members of the population being concentrated in the Raleigh-Durham-Chapel Hill metro area. Interestingly, senators from Utah, Colorado, Kansas, and Nebraska all vote with Trump much more often than education would predict. we suspect this also has to do with geographic clustering of the highly educated members of their populations. Finally, senators from New Mexico vote for Trump far less than education would predict.

Conclusion

While education did a fairly good job of explaining voting for Trump at both the constituent and senator level in some states, it did not in others. Geographic distribution of the highly educated part of the population explains this disparity in a few states, such as North Carolina. In the remaining states, other factors must be at play. Scholarly work on the other predictors of support for Donald Trump can be read here, here, or here.