My model for predicting the 2016 Democratic Primary/Caucus Results

I'm developing a model for predicting the Sanders vote share in the upcoming Democratic Party primaries and caucuses (2016). Unlike others, I've chosen to create a county level model. Ultimately this can be used to do real-time analysis of votes as they come in on election night. If anyone wants to work on developing this model, I would LOVE to hear from you. I've got an early version of the model (and a real-time "vote swing" analyzer), but it needs work. Notably I need a method for estimating county level turnout so I can translate the county swings into a state wide swing.

If anyone has a county-wide model they'd like to share, or some county variables that I should add please email me.

So here is a county-level data set that others might find useful:
My Data

This county level data includes the following states with primaries: AR, GA, LA, MI, MS, OK, SC, TN, TX, and VA. It excludes VT and MA as I was unable to find county level election results for them (email me if you have this data!).

I included the following states with caucuses: IA, NV, CO, and NE.
I wasn't able to find county level data for caucuses in KS and ME.

Most of the demographic variables are from the American Community Survey (2010-2014, 5 year summary).

-FIPS: county FIPS code
-Bernie: number of Bernie votes
-Hilary: number of Hilary votes
-pBernie: Bernie votes / (Bernie votes + Hilary votes)
-Income: median household income
-Pop: total population
-White: number of white people (includes Hispanics)
-Black: number of black people (includes Hispanics)
-Indian: number of American Indians (includes Hispanics)
-Asian: Asian Americans (includes Hispanics)
-Island: Pacific Islander and Hawaiian (includes Hispanics)
-Other: Some other race (includes Hispanics)
-Multi: Two or more races (includes Hispanics)
-pwhite/pblack... percents
-pwhite2: percent white non-hispanic
-phispanic: percent Hispanic (includes whites)
-age: median age
-incomeblack, incomeindian, incomeasian, incomewhite, incomehispanic - median household income by race, not so useful due to missing data
-m1829: percent men age 18-29
-m3044: percent men age 30-44
-m4564: percent men age 45-64
-m65plus: percent men age 65+
(and same variables for women -- f1829...)
-bpoll: latest poll average (bernie support / (bernie support + hillary support)) from Pollster if available, if not from 538, if not then missing.
-prehs: percent with no education or below hs education
-hs: high school degree or GED equivalent
-somecollege: partial college or associates degree
-college: college degree
-postcollege: masters, doctorate, professional degree
-collegeplus: combines college with postcollege
-bike: percent of commuters who use bikes
-poverty: percent of households below the poverty line
-closed: ignore this - was using it for closed primaries, but lack the sample size
-density: population density
-income_change: the ratio of median household income (2010-2014 ACS vs 2007-2011). So greater than 1 means that income (controlled for inflation) increased.
-union: percent of union members in the STATE. From the BLS.
-Obama2008: Obama vote / (Obama vote + Republican vote) from 2008. I couldn't find a source for 2012 county level data.

In general the most important variables are: race (not just black, but white, hispanic, american indian, multiracial, and other), union, bernie poll average, age (can be split into various age groups - notably Sanders does well with men 45 to 64 and with women 18 to 29), and education. Believe it or not, but the percent of commuters that are cyclists is often significant. Sanders does surprisingly well with states that have high union rates (funny because most of the unions are officially endorsing Clinton), and with areas that have more multiracial people or Native Americans.

Interested? Email me!