Here are my predictions for today's Democratic primaries in Kentucky and Oregon.
In the national polls Sanders has been losing 0.5-1%/week versus Clinton. The Google Search Trends reflect this. Previously Sanders was getting around 65% of the combined Sanders + Clinton search trend, but now this has fallen to around 56%. For instance, in the past 7 days even though it is one of the few states which has an active campaign left -- Sanders has only got 53.5% of the searches in Kentucky.
For WV I'm predicting Sanders 55.3 %, Clinton 44.7% (percent of the shared Sanders + Clinton vote).
My latest Indiana prediction for the Democratic Nomination is Sanders 52.6 %, Clinton 47.4 %.
Sanders is doing very well in the search trend.
The model includes the polling data, FB likes, Google search trends, past election results, and demographics (race, income, sex, age, education).
Tyler Pendigo - Sanders 52.6 / Clinton 47.4
Benchmark Politics - Clinton 51.4 / Sanders 48.2 (Other: 0.4%) - so rebalanced - it is Clinton 51.6 / Sanders 48.4
Pollster Average - Clinton 54.1 / Sanders 45.9
PredictIt - Clinton 72% chance of winning.
Early projections for upcoming primaries. Later projections are likely to be more accurate as they can incorporate polling data. I'm not using polling data at this time.
Indiana: Sanders: 50.3 / Clinton 49.7. Sanders search trend is strong, probably due to advertising.
Kentucky: Clinton 57.0 / Sanders 43.0. Sanders is down 1% from my last projection.
Oregon: Sanders 62.2 / Clinton 38.8. Sanders is down 8% from my projection a month ago. It includes the start of the Sanders big rally in Oregon (and the Google Search trend boost).
My early model for Indiana is projecting: Sanders 50.6 / Clinton 49.4
This is Sanders down 2.8% from the previous 53.4% projection. It also is in disagreement with the polls, 538, and PredictIt - who all say Clinton will win.
I added a PredictIt variable to the model. I use the PredictIt close price for Clinton (percent chance of winning) the day before the election. Then I convert it into standard deviations (which makes it linearly correlated with the predicted percentage). Turns out it is a fairly minor factor, except for caucuses.
I also updated my poll numbers and 7 day search trend data.
Take this with a huge grain of salt. Hopefully my real-time county swing model will do better.
My county model (or more accurately one of the models - as the situation is more of a continuum) of primary only states came up with a prediction of Clinton 62.0 / Sanders 38.0 for NY.
That said, the main goal was to predict county values that would let me use my real-time county swing analyzer to predict the state swing.
I decided to create a state level model for predicting the 2016 Democratic nomination race.
So I've aggregated my county level data into state level variables. I also added election results for WY (actual votes - from http://www.crowdsourcingdemocracy.org), KS, and AK (state delegates).
Note the election results for WA are legislative district delegates, and for ME they are state delegates - not popular vote.
My various state models give different results for NY. I've got Sanders at anywhere from 33% to 46%.
I think that given the number of polls, it is likely that a purely polling based forecast will prove the most accurate for NY.
Currently Pollster has it at 56.2% Clinton / 43.8% Sanders.
My demographics models are significantly more pro-Clinton, with Clinton being at around 66%. It is possible that the truth will like somewhere in between in which case we could see a 60/40 split.
If you want to make your own county-level model for the 2016 Democratic nomination race, you can use my data set.
I would LOVE to hear from anyone who is using this. What does your model look like? What variables are you including? What are your predictions? What additional factors are you adding to the model that I don't have?