Polling 101
Posted by Richard on March 7, 2011
Legal Insurrection is featuring a three-part series this week on polling by guest poster Matthew Knee. The first part appeared Monday, and based on it, the series looks to be a very valuable primer on the subject:
Analyzing polls with only what polling companies release is a tricky business. Near-ideal poll analysis requires a database of actual, person-by-person responses, expensive software, and advanced mathematics. Ideal poll analysis requires actually being the pollster and having an overstuffed budget. However, there are a number of rules, tips, and tricks that anyone – with a bit of logic and a calculator – can use to draw meaningful conclusions from flawed polls and incomplete information.
I will be addressing these issues in three stages. In the first section, I will talk a bit about how people answer polling questions. In the second, I will discuss samples and biases. In the third, I will discuss techniques for evaluating the seriousness of bias.
All-purpose disclaimer: This series will include approximations and simplifications. It is for understanding media polls, not for writing articles for scholarly journals. It is also not exhaustive. The list of specific problems that can arise, especially in poll wording, is, obviously, enormously long.
Read the whole thing, and read parts 2 and 3 when they appear. You'll be better equipped to understand all that polling data that the MSM throw at you — and to view it with the appropriate amount of skepticism.
David Bryant said
Part 1 was pretty good. Here are links to the second and third parts … I haven’t had time to digest them yet, but thought the links might be useful.
Part 2
Part 3
rgcombs said
Thanks, David! I haven’t read them yet — I think they’ll have to wait until tomorrow.
David Bryant said
I finished reading through Part 2 (”Samples & Margin of Error”), and I’m more than a little disappointed. Math is clearly not Mr. Knee’s long suit.
”The margin of error is a bell curve representing possible outcomes …”
This statement is simply false. There is a theoretical “bell curve” (Gaussian) that closely approximates the probability that a given sample is representative of the population taken as a whole. The commonly cited “margin of error” is a number (not a curve!) that represents how widely similar samples are likely to vary from the given sample, with 95% probability. In other words, if the poll were conducted many times with different random samples of the same size, 95% of those polls would be expected to find the same result as the original poll, within a precision given by the “margin of error”. Or, to put it another way, there is only a 5% chance that the “true” result for the entire population differs from the sample measurement by more than the “margin of error”.
”The stated margin of error for a poll is the margin of error for the entire poll, and is primarily driven by sample size.”
In fact, the stated “margin of error” is driven ”entirely” by the size of the sample, not ”primarily”. OK, that’s a nit, and Mr. Knee does make some valid observations about sub-populations within a poll. But he fails to give a simple formula for the “margin of error” which is very useful for careful thinkers (the margin of error is, roughly, the square root of the number of people who answered a yes/no question). For instance, if the poll consults 1,000 people, the margin of error is roughly 3.2% (because the square root of 1,000 is 32, very nearly, and 32 is 3.2% of 1,000). If the same poll consults 500 Democrats and 500 Republicans, the “margin of error” for the two parties taken as sub-populations is roughly 4.4% (again, because the square root of 500 is 22, more or less, and 22 is 4.4% of 500).
—
Maybe I’m just too picky about math. I’m still disappointed by Part 2.
Matthew Knee said
David –
1. The margin of error CAN be affected by population size in certain cases. This almost never happens in political polling, but I said primarily because this is possible.
2. Margin of error technically describes the bell curve rather than being the bell curve, but I did not want to confuse general audiences with the extra layers of complexity about distributions and confidence intervals and bring home the point that possible outcomes are not distributed evenly. I still explain the definition of margin of error exactly as you described. Still, I might go back and clarify that sentence, so thanks for the catch.
rgcombs said
Matthew, thanks for dropping by!
Back in college, I managed a B in Intro Psych Statistics without attending a single lecture, but — as David can attest — these days, too much math makes my brain ache. So I’ll defer to you guys on the technical arguments. It’s Cal Tech vs. Yale, and way out of my league! 🙂
David Bryant said
Mr. Knee has actually posted five items about polling this week. Here are some more links.
Part 4
Part 5
David Bryant said
I have finished reading all five installments of Mr. Knee’s course on polling. Part three — an analysis of the PPP poll that showed slipping support for Scott Walker — left me confused. I guess he was trying to make a point: the bias inherent in the sample (too many union members, and too many people who had voted against Walker last November) does not completely explain the result, so Walker’s public support has indeed slipped a few percentage points, but probably not as much as the reported 7%. Unfortunately, I really had to struggle to follow his argument.
Parts 4 and 5 were better. In part 5 he actually alludes to the biggest problem with complex polls that attempt to isolate ideological trends: it’s simply impossible to ask enough people enough unbiased questions to get at the big picture with any reasonable degree of precision.
Here’s where he might have offered his readers a little more information about statistical methods. The mathematical tools pollsters use to gauge the reliability of their estimates were originally devised in the context of physical experimentation. Those tools rely on a simple model. We have a set of measurements — all of the same phenomenon — that are subject to random error. How can we estimate the size of the inherent errors from these data, and how close is the average of all our measurements to the “true” result?
The underlying mathematical assumption on which classical statical methods rely is that the “errors” are independently and identically distributed. This is an adequate model for political polls that ask a simple yes/no question (Will you vote for candidate A, or B?). People may change their minds between the time the poll is taken and election day, but in the absence of some major event that changes everybody’s perceptions, the “A” voter is just about as likely to change his mind as the “B” voter is. So simple “A” or “B” polls are usually quite accurate.
The simple yes/no model can be jazzed up a little bit to cover more complicated questions (Do you favor or oppose, or are you neutral?). But the underlying assumption of classical statistical methods — that people are random members of a population that is “independently and identically distributed” — is simply not accurate when the whole complex of political and economic ideas comes into play. People can take independent actions, but our ideas, and our values, and our choices, are heavily influenced by the views of our neighbors. And we’re certainly not identical — not in terms of our wants and needs and abilities, at least.
Anyway, I like Mr. Knee’s general inclination toward fiscal conservatism, and I hope he uses his skills to focus public attention on the kind of questions politicians really ought to ask: what can I do to make everybody more successful in their pursuit of happiness?