All posts tagged: Polling Data

Danger in the Polling Data

The devil in the data that left election forecasters with egg on their face this week has a familiar name — it’s the same villain that tripped up the banks that financed subprime mortgages back in 2007, causing the financial crisis. Its name is “correlated error.” Prediction models can make very accurate forecasts based on many not-so-accurate data points, but they depend on a crucial assumption — that the data points are all independent. In election forecasting, the data points are polls, which are clearly imperfect. Every individual poll has a relatively large margin of error amounting to several percentage points, sometimes favoring one candidate, sometimes the other, all skewed by hundreds of small things — the specific respondents chosen, the means of contact, the phrasing of questions, the representation of voter demographics and so on. These errors can be magically smoothed out by poll aggregation, giving a much more accurate mean polling number — provided the errors in individual polls were all due to different causes, and were therefore independent and uncorrelated. We saw …