2016 Presidential Election: Education, Race, Age, Density and other Factors

Early findings on 2016 Presidential Election
I have a county-level model that I used to predict the outcome of the Democratic Presidential Primary ahead of the election, and also a real-time model that estimated the state-wide swing based on the county swings. I used this model to make a significant amount of money betting/predicting the outcome on PredictIt.org. I also created my own real-time model for the presidential general election and used that (and the NYT real-time model) to make more money on PredictIt.

This county model includes variables like: age and sex (broken into age/sex groups - like women 18-21, 220-29, etc), race, education, income, density, past voting statistics (2000, 2008, 2012), and Facebook likes (from the primaries). The demographic variables are all from the American Community Survey.

For the election data, I am using the CNN results from election day (and the 12-24 hours after when they continued to update it) which includes 123,291,124 votes for Clinton or Trump. This data excludes the late votes. For instance Election Atlas has currently 127.1 million votes counted - and I expect another 1 million votes to be added. I also exclude Alaska because they do use state electoral districts to report votes (and these differ from counties).

I apply a county-level linear regression model to analyze the factors that can explain Clinton's percent of the Clinton + Trump vote (eg. y = Clinton / (Clinton+Trump). Of course this can also explain Trump's percent.

I am using a step regression with SPSS that adds the most significant variables one at a time to build a model. I weigh the counties based on the number of total votes cast for Clinton and Trump.

Model #1
1. Obama Vote in 2008
The percent of vote that Obama got in 2008 (per county) is a stronger factor than the Obama 2012 vote (though by a small amount) – which is a bit surprising. This one variable explains 89.5% of the model (or Adjusted R^2) – so there is only 10.5% that remains to be explained. Clinton received strong support from Obama 2008 and 2012 voters. What this indicates is that most people voted for the same party that they did in previous elections. It only takes 3-7% of the voters to switch parties to swing an election - and the Trump victory is not due to a large shift in voter support (and Clinton is probably going to win the popular vote by 1.8%).

2. High School Graduates
The second strongest factor is the percent of people who graduated from high school who did not attend college. These people voted for Trump. This factor adds 5.8% to the model (for a combined total of 95.3%). Education played a key role in the 2016 election and probably a much larger one that in other recent presidential elections.

3. Percent White
White people were even more likely to vote for Trump than Clinton, even after controlling for Obama 2008 vote. Another way of saying this is that white people were more likely to vote for Trump than Romney. This adds 1.8% to the model for a combined R^2 of 0.971.

4. College Graduates
College graduates were more likely to vote for Clinton (even after we've already controlled for high school graduates). This adds 0.4% to the model for a combined R^2 of 0.975

At this point the additional factors are becoming increasingly minor. The fifth factor is that Romney 2012 voters were more likely to vote for Trump (might represent some of the demographic and political shifts between 2008 and 2012). The sixth factor is that Trump Facebook likes increased Trump support. The seventh factor is that post-college degrees favored Clinton (at a slightly lower level than college graduates). The eigth factor is that Clinton did better in areas with a higher population density (using the log of population density). The ninth factor is that Trump did better in areas with high population density when you don't take the log (I wouldn't worry about figuring this one out). The tenth factor is that Clinton did better with Nader 2000 voters (Possibly due to by election day Stein's support fell even more dramatically than Johnson's).

What is interesting is that we're not seeing any age or sex effects until the seventeeth and eighteenth variables (where women 45-64 go for Clinton and men 45-64 for Trump). Income is also not a factor. I was surprised that percent Hispanic was not significant, and percent Asian was actually correlated to Trump support!

Model #2
For the second family of models, I removed the variables for percent vote for Obama/Romney/Other/etc in past elections. I expected race to be the most significant factor, but I was wrong!

1. Facebook Likes
Clinton and Trump FB likes explained 85.4% of the outcome!

2. Percent Black
Percent black explains an additional 2.6% of the outcome.

The other factors are: density (not the log), percent multi-racial, women 45-64, commuters that cycle to work, and men 18-21. Density was positively correlated with Trump (weird - but it also isn't the log and generally the log is more useful). The other factors were all positively correlated with Clinton support.

Model #3
When I removed FB likes, the most important factor was log of density - and that explained 50.7%.
postively with Clinton

Model #4
When I removed density, the most important factor was percent white - and that explained 45.8%. Clearly race is strongly correlated with density. (race negatively correlated with density)

I haven't read any multi-variate analysis of the election result. 538 and others have looked at single variables, but nobody has built a model that I've seen - so I decided to release my unpolished one.

Basically this election result can be explained as a typical election with education (especially) and race (secondarily) playing a larger effect than in the past. It'd be interesting to break down the high school graduates that voted for Trump. Normally high school graduates would be more likely to be people of color and/or low-income (as education, income, and race are all correlated), but I suspect that in this election they were white. I'm uncertain as to whether their income would be higher or lower than typical high school grads.

Typically higher education is a strong predictor of turnout. However, in this case the likely voter models used by pollsters may have failed to recognize the higher level of enthusiasm than in past elections of predominantly white high school graduates - and thus under-estimated Trump's vote by around 1.3% nationally and failed to project a Trump win.