National News

Why Polling Can Be So Hard

Some states and congressional districts are a lot harder to poll than others, and this can have major consequences.

Posted Updated

By
Nate Cohn
, New York Times

Some states and congressional districts are a lot harder to poll than others, and this can have major consequences.

Take Wisconsin. The polls there were misleading in 2016. Hillary Clinton, apparently feeling she had a secure lead, didn’t even campaign in the state.

The Wisconsin polls probably underestimated the number of white voters without a college degree, or missed the undecided voters breaking for Donald Trump. Those were problems in many states, but might have been particularly bad in Wisconsin, which has a big share of white working-class voters.

But as we’ve been reminded while conducting New York Times Upshot/Siena College polling in recent weeks, the state also has one of the nation’s least helpful voter registration files. These are the foundation of modern campaign polling: a data set of every registered voter.

In general, pollsters use data from voter files to ensure that their samples are representative — by having the right number of younger voters or older voters, for example, or Democrats or Republicans. Without that information, polling becomes a lot more difficult. In Wisconsin, pollsters have less data to work with.

Here’s a tour of some of the challenges we’ve tackled in conducting polls of a dozen of the most competitive congressional districts, what we’ve done about them, and what we plan to change in the weeks before the November midterm elections.

— Partisanship as a Predictor

For pollsters, the most important data on the voter file is partisanship. This means a person’s party registration or, in the states without it, a person’s history of participating in partisan primaries.

With this data, pollsters can ensure they have the right number of Democrats or Republicans in their polls, regardless of who is likelier to pick up the phone.

In the end, a direct measure of partisanship is the strongest predictor of vote choice, despite what you’ve perhaps read about fancy microtargeting models that claim to nail down attitudes of voters based on something like the car they drive. The simple, old-fashioned measures of partisanship do the bulk of the work in the models and the polls.

But Wisconsin is one of a handful of states where none of this data is available. There’s no party registration, and the voter file indicates only whether someone voted in a primary, not which one. Minnesota, North Dakota and Montana are also in this category.

It’s an even greater challenge because these states, with a lot of white, rural residents, have relatively few other characteristics that strongly correlate with partisan vote choice.

In other parts of the country, weighting — giving more weight to respondents from underrepresented groups, to ensure the sample reflects the demographic profile of likely voters — can get you a long way because characteristics like race, age and education are also correlated with partisan vote choice.

In our initial Wisconsin and Minnesota polls, we took an all-of-the above approach. Not only did we weight by those standard categories, but we also tried to estimate the likely partisanship of our respondents using our prior polling data. Ultimately, this was only somewhat successful.

So far, we’re observing a pretty modest response bias toward Democrats, particularly in well-educated districts. You can see that for yourself by looking at the “if we didn’t weight by party” option in the weighting section of our live poll pages. This would seem to imply that, in general, we’d be at a greater risk of overestimating Democrats in places like Minnesota or Wisconsin.

We’re not going to shy away from polling the most important districts. But at the margins, we’d prefer to poll in the places where we can leverage as much data as we can. Minnesota’s 1st District, for instance, is the kind of race that we would probably poll if it had party registration. We’ll probably avoid it.

— Tough Demographics

In general, urban, nonwhite and young voters are harder to reach than rural, white and older voters. In places where you have a lot of the first three kinds of voters, things can get tough, quickly, for pollsters.

Florida’s 26th District was a fight for every respondent. (We made about 46,000 calls to yield about 500 responses.) The district is mainly in Miami-Dade County, and more than 60 percent of voters are Hispanic.

A quick, easy poll would have had far too many non-Hispanic respondents. Indeed, they outnumbered Hispanic respondents early in our survey. But we employ bilingual interviewers, and we call back voters multiple times. After days of callbacks in Miami-Dade County, the Republican incumbent, Carlos Curbelo, took a lead.

We’re probably going to move toward a more rigorous way of dealing with this: stratification, in which we essentially break up the sample into mutually exclusive groups and treat each like a separate sample. That will help us make sure we don’t overlook a group that’s less likely to respond to a survey.

— What’s Your Number?

Another challenge is telephone number coverage: which people on the voter file actually have phone numbers.

This is one of the big downsides of polling off a voter file (versus using random-digit dialing), and the severity of the disadvantage varies a lot from state to state.

On paper, most of the voter files look about the same. More than half of voters have a telephone number. Around half of those voters have a cellphone number. In general, older, white, longtime voters are likeliest to have telephone numbers.

We adjust for this by drawing our sample in tiny groups of voters, like white, high-turnout Democratic men from a rural area who are 18 to 34. We make sure we draw the right number of telephone numbers from each group, regardless of how many people in each group actually have a number associated with their voter file.

One thing that’s harder to adjust for is the proportion of younger voters with a cellphone number. There are some states where we have a lot of younger voters with a cellphone, and others where we have virtually none.

This makes a big difference, since we have a pretty good response rate among younger voters with a cellphone and a terrible one among those with a landline. In many cases, I suspect, those voters provided their parents’ landline telephone numbers when they first registered to vote.

But the challenge is even greater than it seems. The few cellphone numbers we do have in these low-coverage states are notably less likely to yield a response than in the other states. So not only do we have fewer cellphone numbers, but those numbers appear to be worse, too.

The result: In places like Virginia’s 7th, Texas’ 7th or Kentucky’s 6th, we’ve really struggled to contact younger voters. In other places, where we’ve had ample cellphone numbers, we’ve often had too many younger voters.

We’re not alone on this. The Monmouth poll, for instance, works off the same lists that we do.

We have a few adjustments in store to try to deal with this in our last wave of polls. We’ll expand our young voter weight from 18-34 to 18-39 in some states. We’ll weight using self-reported and not voter-file age (the “young” people we get in these tough states occasionally turn out to be their parents, despite our best efforts). And we can have more telephone numbers from young voters ready to go in those states.

These problems can’t really be solved, but they can be mitigated, and we’ll keep trying to do that in the seven weeks until Election Day.

Copyright 2024 New York Times News Service. All rights reserved.