Predictions: CA, MT, ND, NM, and SD

Here are my predictions for the June 8 Democratic presidential primaries.

This is from my county-level model which includes past election results, age groups, sex, education, FB likes, Google searches, density, income, caucus/primary, whether 17 year olds can vote, and more.

There is a high degree of uncertainty due to AP which yesterday declared that Clinton has secured enough delegates for the nomination. In addition, even before this the campaign was winding down. This is noticeable in the national polls and especially in the search trend where Sanders has been losing a lot of ground. There were a lot of polls for CA, but they had very different results - with a gap of anything between 18 percent and 2 percent. The most recent polls forecast a 2 percent gap (which differs from my model and the other demographic/social network models). The NJ polls were more consistent and showed a strong Clinton lead. There was a single SD poll by Targeted Persuasion which had Clinton up by 3 points. As there is very little information about this pollster, I'm ignoring it - though it does agree with my model. For MT, ND, and NM there weren't any polls so my model for them and SD is heavily reliant upon the search trend. The search trend had a massive spike in MT, SD, and ND in mid-May - but I use the 7 day search trend and am excluding that.

CA: Clinton 54.2 / Sanders 45.8 (Benchmark Clinton 54 / Sanders 46)
MT: Sanders 57.4 / Clinton 42.6 (Benchmark Sanders 62 / Clinton 38)
NJ: Clinton 68.7 / Sanders 31.3 (Benchmark Clinton 64 / Sanders 36)
NM: Clinton 56.8 / Sanders 43.2 (Benchmark Clinton 57 / Sanders 43)
ND: Sanders 54.0 / Clinton 46.0 (Benchmark Sanders 60 / Clinton 40)
SD: Clinton 54.4 / Sanders 45.6 (Benchmark Sanders 57 / Clinton 43)

In general, my Clinton winning SD forecast is an outlier (so much so that I'm wondering if there is an typo error in the model -- or it could be getting on the trend that had it be the only state in the region that Clinton won in 2008 - or it could be that the age groups are bad for Sanders). In general, I have Clinton doing better in MT, SD, NJ, and ND than Benchmark - and very similar predictions for CA and NM (both within 0.2%).

I have a theory that Sanders supporters and volunteers are losing enthusiasm due to losing the race and that this can been seen in polls and the search trend. If your model does not have any variables that can track support over time, then it will be less accurate.

I'm very happy that I was one of the few forecasters to predict a Clinton victory in SD.

That said, Benchmark Politics was more accurate overall, but not by a lot. Everyone suffered from a lack of polling and the dynamics changing as the campaign ended.

CA: Clinton 56.4 / Sanders 43.6 - My error 2.2%
MT: Sanders 53.4 / Clinton 46.6 - My error 4%
NJ: Clinton 63.3 / Sanders 36.7 - My error 5.4%
NM: Clinton 51.5 / Sanders 48.5 - My error 5.3%
ND: Sanders 71.5 / Clinton 28.5 - My error 17.5% (Caucuses are hard)
SD: Clinton 51.0 / Sanders 49.0 - My error 3.4%

For the primaries, I under-estimated Clinton in CA and MT (7.5% total) and over-estimated her in NJ, NM and SD (14.1%). The fact that I also over-estimated Clinton by a massive 17.5% in ND fits with the theory that my model was too reliant upon the seven day search trend. Sanders had a big downturn in that search trend in all of these states, except for CA where his advertising and campaigning are likely to have been the causes that kept it higher.

Updated vote counts (using Secretary of State pages - as CNN and NYT do not have the most recent results)

CA: Clinton 55.1 / Sanders 44.9. My error falls from 2.2% to 0.9%.
MT: Sanders 53.8 / Clinton 46.2 My error falls from 4% to 3.6%
NJ Clinton 63.4 / Sanders 36.6 My error falls from 5.4% to 5.3%
NM: Clinton 51.5 / Sanders 48.5 No Change
ND: Sanders 71.5 / Clinton 28.5 No Change
SD: Clinton 51.0 / Sanders 49.0 No Change