Hacker News new | ask | show | jobs
by raverbashing 1060 days ago
But the question is: how many collisions with IPs there are by showing only 4 digits of the hash

I think there's where the anonymity claims might have come from

1 comments

Assuming even distribution of four hex values: 16^4 = 65k potential IP collisions. From my quick skim of the paper, the authors made some assumptions about posting tendencies (frequent posters likely to comment on multiple topics) and looked for enriched patterns of IP addresses. An IP address assigned to multiple topics within a short timeframe is more likely to be real. As a control set, they took a different four values of the hash function (eg true function samples [10:14], false set took [11:15]) and used that as their statistical threshold.
Sounds like there's some margin for plausible deniability there, especially if they can make it match different universities in the same range