Hacker News new | ask | show | jobs
by dawnbreez 3817 days ago
We should not use statistical inference because statistics do not always paint an accurate picture.

An extreme example is that many criminals enjoy ice cream; obviously, it's a fallacy to say that all ice cream eaters are therefore criminals. But what about correlations between crime and race, sexual preference, location? If my neighbor is a terrorist, am I his accomplice? Is everyone who lives in a bad neighborhood a crook, or are some of them trying to scrape by legally?

It seems like this argument appears on every single thread about surveillance. Data does not lie, therefore we cannot lose if we use data to solve crimes. The problem, friend, is that data does not lie--humans do. We lie to ourselves all the time, because the patterns fit and we are creatures that survive by pattern recognition. We see patterns even where they aren't patterns.

1 comments

It's also very much possible to draw the wrong conclusions as you mentioned. So while data may not lie, there certainly are plenty of faulty interpretations.

What scares me is the fact the data aggregation is being done with mere promises of "trust us, we won't look" . Systems built this way are inherently vulnerable to secondary and tertiary usage. That's why I think there is a strong case for end to end encryption so that there is separation but ultimately this is still a problem because there is no way to enforce metadata constraints on data. Someone may design such a system but others will just resort to using something else. Frustrating.