Hacker News new | ask | show | jobs
by nsagent 666 days ago
If anything it sounds like "related" is not what they are actually doing. Rather they are looking at ways to uniquely fingerprint users through optimizing how they split "related" sites.

Reminds me of the research that shows that 87% of people in the US can be uniquely identified with only three pieces of information: date of birth, gender, and zip code [1].

[1]: https://dataprivacylab.org/projects/identifiability/paper1.p...

2 comments

That seems to be saying it is extremely likely that the only other person in my zip code that shares my birthdate is opposite gender
Only 50% of the time, but that’s 50% better of a guess than you’d make without knowing gender.

ZIP codes contain maybe 40K residents [0] (many contain fewer) and there have been around 25K days in the last 70 years. Sure births are not evenly distributed, but still...

[0] https://www.unitedstateszipcodes.org/images/comparison-of-po...

That sounds like a pitch for one of those "singles near you" apps. Find hot women in your area who share your birthdate!
statistically, 50% chance, innit?
OP seems to claim 13% same / 87% opposite
I don't think you can make that conclusion.

I think you're making the assumption that all three data points are needed for all 87%. But obviously some people can be uniquely identified based on just {zip, date or birth}, such that gender isn't necessary.

So the distribution could e.g. be 8% same, 8% opposite, 5% both, 79% neither, and explain the original numbers without triggering the paradox.

Yeah I was off, but by their numbers

87% of the time, there are no others on my birthdate or there is one other and opposite gender.

13% of the time is 1 same gender or more of either or both.

Really? That's odd. The typical zip code has a population of about ~9000. Dates of birth are about evenly distributed, so you'd still get about 24 people/birthday, or around 12 men or women per birthday per zip code.. I might be off by a fair amount in either direction, but I don't think I'd be twelve times off.
Dates of birth are not evenly distributed.

To clarify: your date of birth includes the year. It’s more specific than your birthday, which we usually think of as just day & month.

Also, the difficulty of identifying someone probably looks like a power-law curve, meaning that most of the "total difficulty" is concentrated in a small group, the ~13% that can't be identified.

In other words, even if one person is extraordinarily tricky to find [0], their share of the total un-findable-ness does not diffuse outwards to help anybody else.

[0] http://tailsteak.com/archive.php?num=433

Oh, ok, I didn't realize that the data included the year. Never mind, I don't know the US age distribution well enough to have any idea of how plausible it is; I withdraw my comment.
birthday != date of birth