Hacker News new | ask | show | jobs
by nezzle 3024 days ago
irrelevant.

In fact, knowing just four random pieces of information was enough to reidentify 90 percent of the shoppers as unique individuals and to uncover their records, researchers calculated.

https://bits.blogs.nytimes.com/2015/01/29/with-a-few-bits-of...

2 comments

I’ve done research on credit card data like that. I can tell you both experientially and mathematically that four bits of random information is insufficient to identify people. The information was not anonymized and they were tracking people engaging in a common, narrow activity. Not only that, but they were only tracking 1.1 million individuals. They had a relatively small search space and significant non-random information with which to bootstrap the deanonymization. Calling that “four bits” is disingenuous.

Contrast this with trying to identify a single individual in a population with no other information about them. It would take about 33 bits if we knew absolutely nothing about her, given log_2(7,280,000,000) = 32.7. But we know she’s American, so we can cut our search space down to 322,000,000. That leaves us with 28 bits. We also know she’s a woman, so we can cut our search space down by 50%. Now we have 27 bits to go. I can virtually guarantee you an analysis of anonymous donation patterns will not meaningfully cut down the search space beyond a few more bits, and that’s exceptionally non-random data. The more useful information is knowing that she resides in New Hampshire, but that still only brings us down to approximately 20 bits.

It's not unreasonable to try to make something a bit more difficult even if you understand it won't stop determined attackers. There's a middle ground between 'doing nothing' and 'making it impossible'.