Hacker News new | ask | show | jobs
by mytype 5805 days ago
Yes, it's a biased sample. I've already admitted that. My point is that it's not egregiously more biased than any other sample from academia or professional surveyors, and in many cases it's less biased.

Do you reject most academic psychology research because it is based on students at the college of the researcher? Certainly that's a more biased sample.

Do you reject political polls because they're based on who answers calls from random phone numbers and then does not hang up once they realize it's a poller?

Do you reject most commercial research based on paid volunteers or visitors to sites with much smaller and more biased audiences than Facebook?

I state upfront in the article that this is based on MyType users who are on Facebook. What more do you want? If you want no bias, just do math and don't believe any data based on people, written by people, spoken by people, anything having to do with people.

The bottom line point is, this is much more rigorous than much of the crap blogs and media report on. I'll take a random example that I googled for the iPad: http://techcrunch.com/2010/04/06/ipad-sentiment-analysis/. "87% of tweets indicate intent to purchase the iPad". Give me a break. Talk about bias. The sampling errors there are horrific.

I'm just trying to maintain a reasonable perspective on MyType's data, not hide any facts about the shortcomings of it. There are shortcomings, they're just not so bad to make the results "entirely unreliable".

2 comments

"87% of tweets indicate intent to purchase the iPad"

oddly enough, I have less of a problem with that article than I do with your post.

They clearly acknowledge all the limitations of the data right up front.

I can read that article and understand within the first 2-3 sentences that they are playing a game of mental masturbation, and then grin at the conclusions.

Its clearly a pointless piece of puff, and perfectly enjoyable as such.

My problem with your blog post arises because it is inviting me to take it more seriously than that.

You state upfront that it is based on MyType users who are on Facebook and who participated in a personality quiz.

The question you never speak to, and need to answer, is why on earth do you believe that a narrow sample like that can reasonably be used to draw conclusions about the broader set of iPad users?

do you fully intend that the blog post be a pointless piece of puffery similar to the techcrunch article? in that case, make that explicit.

do you actually believe that you can, using the statistics you have available, speak usefully about the broader set of iPad owners? explain why, giving your confidence level and other assumptions you have made.

If you want me to take it seriously, you need to take it seriously.

Bias in sample data is unavoidable, but the bias should be clearly called out before, during and after the conclusions to ensure that the context is not missed.

and yes, I do reject any poll that does not take the idea of sample context and data bias seriously, regardless of its source.

If you do not clearly acknowledge the limitations of the data you have, you might just as well spend your time making numbers up.