| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by gabestein 4761 days ago

Thanks for the feedback. As I'm fairly new to all of this, I'd love to hear how I can turn this into a more viable experiment.

Also, to defend myself a little: I think I'm at least being responsible by in no way claiming that it's statistically valid, and in fact making the point that it's not, and citing the reasons why, several times in the article.

I do stand by my point, however, that this helps the public understand the danger of metadata for data-mining, as well as introducing them to the pitfalls of statistics.

1 comments

quchen 4761 days ago

> As I'm fairly new to all of this, I'd love to hear how I can turn this into a more viable experiment.

The most important thing is getting an estimate of how good your averages are. Try to modify a couple of things, for example how do the 67% change with sample size (verification)? How does the number turn out if you feed it biased data, e.g. only male phone numbers (falsification)?

link

gabestein 4761 days ago

Thanks. I solemnly swear to not abuse statistics, so I'll play around and update the piece.

The goal is to use this as a hook to get the public interested, then take them for the ride as I learn. That's why I tried to be really cautious about pointing out all the reasons why this particular model is bad.

link

quchen 4761 days ago

I didn't mean to say you specifically were abusing statistics, it's just very easy to draw dubious conclusions. (Doing this intentionally would be the abuse I was talking about.)

link

gabestein 4761 days ago

I know, but I still want to take it seriously, and thank you for your criticism.

link

gabestein 4761 days ago

Thanks to your help, I found a huge, obvious problem that was keeping the accuracy low and remarkable stable no matter what size of training data I tried. Here's the update: http://www.fastcolabs.com/3012908/tracking/im-beating-the-ns...

link