Why polls were wrong about NC Senate race
Posted November 6, 2014
Raleigh, N.C. — On Tuesday night, Republican candidate Thom Tillis beat incumbent Democrat Kay Hagan in the U.S. Senate race in North Carolina, winning 49 percent to 47 percent. Libertarian Sean Haugh earned about 4 percent of the vote. Just the day before, I posted that Hagan was likely to win.
Why was I wrong?
The short answer is that it was a Republican wave, and on average, all of the polls were off. I based my prediction on the averages from five of the most respected and accurate pollsters. These all estimated Hagan had a lead of around 1 to 2 percentage points.
In a race decided by a 2-point margin, it's not unusual to find out the projected narrow winner winds up the actual narrow loser.
Most prognosticators were wrong about this race because polling error across the country in 2014 was correlated with partisanship. Democrats did worse than polls predicted in nearly every state this election cycle. Nate Silver of FiveThirtyEight.com estimated that polling overstated Democrats' support is U.S. Senate races by about 4 percentage points. Republicans won with much larger margins than expected, and no Democrats bucked the prediction that they would lose.
Inflated Democrats' support in the polls does not support conspiracy-minded claims that pollsters deliberately helped Democrats look better. In 2012, for example, polling bias inflated Republicans’ support compared to actual votes. Looking at the graph generated at FiveThirtyEight, it is clear that bias is random. In some years, Democrats' support is exaggerated; other years, Republicans’ support is inflated. Pollsters have every reason to be accurate, or else the press and citizens will view their results with suspicion.
No professional pollster wants to get it wrong. But polls aren't 100 percent accurate, and polling error isn't perfectly random.
Where does polling bias come from?
So, why does partisan bias occur within any single election? The main reason is that, unlike a poll of all adults, election polls care only about what voters think. Yet, there is no foolproof way of knowing exactly which people will show up to vote or stay home. Historical patterns of turnout are useful, but these imperfectly predict any future behavior. Older people, for example, were not just more likely to vote this election than younger adults but also voted at higher rates than expected. Republicans carried this age group handily, helping to explain their larger-than-expected margins of victory.
Quite simply, Democratic candidates did worse everywhere because voters calling themselves Democrats didn’t vote at assumed rates, based largely on turnouts from past midterms. Or, conversely, Republicans were more likely to vote than assumed. Election polls ask at least one question trying to gauge whether a registered voter is a likely voter, and usually they ask several questions. My guess is that Republicans were more likely to meet the likely voter criteria than past turnout suggested, so their preferences were not given the weight they should have been.
Another, and more worrisome, possibility is that election polling is deeply flawed. It's increasingly difficult to conduct representative polling, let alone predict likely voters. Everyone in the business is aware of how hard it is to acquire a representative sample. So, nearly all polls "weight" their data afterwards to correct for demographic imbalances.
If a survey interviewed too many women compared with their percentage of the population, then a weight is attached to their answers to reduce their influence on the overall results. The theory behind weighting is that anyone in the poll can stand for a group. If just 5 percent of those who agree to take the survey are 18 to 29 years old, yet we know this group is 10 percent of the population, pollsters weight their answers to count twice as much as a way to correct their missing numbers.
But what if young respondents no longer do a good job of standing for what all young voters think? What if young people willing to talk to pollsters are more liberal than young people that won't? If polls have trouble getting younger people to respond, and weighting gives added influence to unrepresentative answers, that could be one way bias toward Democrats creeps into election polls.
Number of early voters by age
This chart shows the the number of voters who cast early votes according to how old they are.
Source: N.C. State Board of Elections.
Indeed, this might have happened. Exit polls suggest young people – those ages 18 to 29 – broke for Hagan by a +14-point margin, 53 percent to 39 percent. Yet, polling by Elon University just before the election found this age group preferred Hagan 60 percent to 25 percent. Similarly, Public Policy Polling found a 49 percent to 21 percent Hagan lead among young voters, a 28-point gap. This difference between polling and votes could account for a few percentage points.
More definitive answers will take some time to develop, as more data become available. It’s not clear to me if the problem is something easily fixed, such as doing a better job of predicting likely voters. It's possible that non-response bias is increasingly undermining the ability of surveys to obtain samples that stand for the true population, even after weighting them.
Representative polling is dependent on probability sampling, where everyone in the population has an equal chance of being contacted. Without it, polling mistakes are likely to increase.