Margin Of Error: Breaking down the polls

Margin Of Error: Breaking down the polls

Trust an Internet poll? The New York Times did

Posted August 4, 2014 6:04 p.m. EDT
Updated August 5, 2014 10:56 a.m. EDT

House Speaker Thom Tillis is running, and, polls show, trailing, in the race to represent North Carolina in the U.S. Senate.

— Only one poll that I know of since mid-May finds Republican Thom Tillis with a lead over incumbent Democrat Kay Hagan in North Carolina's U.S. Senate race.

That poll, conducted by “YouGov,” was commissioned by CBS/NY Times and reported a 1 percentage point lead for Tillis. Of course, a 1 point lead is really a dead heat (i.e., “too close to call”). Nevertheless, this poll has sparked controversy and soul-searching among industry professionals.

The poll is controversial because of how it was conducted, and the decision by The New York Times to stand behind it, not because of its results, although the data are clearly in dispute, too.

YouGov conducts its surveys online – completely. Adults are recruited to take the surveys, but the critical factor is that they are not chosen through probability methods where all persons in the defined population (say, in North Carolina) have an equal chance of being asked for their opinion.

It is an understatement to point out that probability-based sampling methods have enabled large-scale representative surveys to be successful for almost a century.

Nate Cohn recently explained, in a post for the Times' “Upshot” blog, the issues at stake. This is a big deal.

As reported in Politico, the American Association of Public Opinion Research (AAPOR) has issued a statement highly critical of The New York Times and CBS News for promoting survey results obtained through a methodology devoid of a theoretical underpinning.

I don’t have a lot to add to what has already been said, especially after the Pew Research Center did a great job of identifying the key aspects of the controversy and Upshot Editor David Leonhardt agreed to an interview with Chris Cillizza from The Washington Post. I thought this development was worthy of highlighting, though, because the decision for a major news organization to promote survey data collected via a non-probability sampling technique is, well, newsworthy.

If I had anything to contribute, it would be the following.

I used to be “old school” about sampling methods – until a few months ago. By that, I mean I would have dismissed the YouGov poll almost out of hand, as many others have done.

Yet, the truth of the matter is that probability sampling is itself broken, and polls that portend to use it depend on weighting their data afterwards. I am unaware of any national or state-level poll that is reported without weighting because the raw data are inaccurate. Response rates for phone surveys have dipped into the single digits. A growing segment of the population cannot be reached over a land line. It's not clear to me how entirely different a poll conducted over the phone where nearly half the population can’t be reached that way – or won’t answer – is from a poll where people opt in to the sample and are contacted only via email.

The only online panel that I personally have confidence in using is GFK, formally “Knowledge Networks.” The difference is that panelists are recruited via probability sampling, finding adults via land phones, cellphones and the Internet, who are representative of the population, and only then sampling from within this massively large panel. The debate about the validity of YouGov is far from settled, but this really is the cutting edge of either a brand new way to conduct reliable surveys or the tip of the iceberg about to sink the Titanic.