Hacker News new | ask | show | jobs
by not-elite 1712 days ago
The formula for error bars under the traditional Binomial assumption is:

   +/- sqrt(p * (1-p) / N)
So the errors are around +/- 1% for most values of p in this article. The article (rightly) points out that the binomial assumption is not reasonable given the survey method.

https://stats.stackexchange.com/questions/29641/standard-err...

1 comments

The traditional binomial assumption assumes independence of the samples, as the accepted answer states. The authors are acknowledging that this assumption does not hold.
It's an odd disclaimer. If it's a quota-based sample then the performance in terms of nominal coverage is going to look similar to a probabilistic sample (especially given the current environment for probabilistic samples, which is quite poor). Probabilistic samples don't report design effects or adjust MOEs to reflect them either, the norm is just to report a classical normal approximate binomial confidence interval (e.g. +/- 1/sqrt(n)) even when a real design effect exists.

My guess is the reason for this disclaimer is that it's not a quota sample, it's just literally a completely undirected opt-in survey and there's no reason to believe this is anything resembling a representative sample, probabilistic or not.

This disclaimer is applied to pretty much all polling done online, even if the samples are weighted to match the population. If you go look at election polls, for eg., the non-IVR ones will all have something like this, or a "If this were a traditional phone poll, the margin of error would be..."

Online polls are usually done by letting people opt-in and then sorting and weighting to sample, just like phone polls. The idea though is that because they aren't reached "randomly" in the first place (as they are by war dialing phone numbers for eg.) there's additional sampling biases at play that a margin of error doesn't account for.

Leger is a legitimate polling company in Canada, and I doubt they did it any different from how they do it for election polls. I'm not sure why people are assuming this was just an unweighted facebook poll or something.

But the reality is that "traditional polls" are probably no better because of the extremely high non-response rate these days. Inertia is a thing though.

An article about this subject: https://www.huffpost.com/entry/margin-of-error-debate_n_6565...

You replied like you were replying to me. I literally do this for a living. My original post speaks to all of this.

Quota-based sampling -- a preferable term to convenience sampling -- (i.e. non-random samples that allow respondents to opt-in but with invitations to opt in extended to a large pool about whom basic information is gathered in advance, with invitations extended in a manner to quota on population weight targets, often with additional post-survey weighting to hit interaction terms in the targets) is non-probabilistic, but its performance is fine -- and indeed given reasonable quotas, the coverage of the classical MOE -- as I said, a conservative normal approximate binomial CI e.g. +- 1.96 * sqrt(0.5 * 0.5 / n) ~= +- 1/sqrt(n) -- is about the same as it is in a probabilistic survey. If your quotas are exactly correct then it's literally the same.

As you allude to by linking that article, sampling error is a small component of the TSE framework. And crucially, both probabilistic and quota-based samples typically do weighting to targets after they get their sample, and neither typically report the design effect (i.e. how the choice to weight affects the sample variance) when reporting results. The choice not to be honest about design effects is a shame of the polling industry. It probably leads to a good deal of "movement" in the polls being completely illusory, which was part of Gelman's point in his earlier writing on the subject.

What I don't understand is why you would report an estimate like this and not attempt to report any uncertainty. The reader is not likely to take away "design-based inference considerations require that we refuse to state a classical MOE representing sampling error on principled grounds", and instead is likely to take away "number in headline = correct".

I don't think that they "just did a dumb Facebook poll". I am concerned that they did not do a defensible quota sample or that they don't have reasonable population weight targets and that may be the cause for the failure to state any measure of uncertainty.

The article you linked is very, very old, reflecting a fear of convenience sampling within AAPOR a decade ago. YouGov more or less won that argument.

Looking back at your post I definitely see better where you were coming from and I do think I was in error to respond to you, you're right. I was frustrated at a few different posts in this thread and maybe conflated some of what you were saying with what others were saying.

I think you are probably right about the poll as well; there's a more specific statement on leger's own release about the poll[1] about methodology and it does seem as if they just pulled people at random from their panel and does not mention weighting them. Which is surprising to me.

[1] https://leger360.com/surveys/legers-north-american-tracker-o...