Hacker News new | ask | show | jobs
by tumanian 2533 days ago
The article assumes that the noise was added intentionally for obfuscation - however, for realtime size estimation facebook would have to rely on some sort of orobabilistic data structures, as sketches, and its wuite possible that the authors are observing the accuracy loss thats coming from these data structures.

One can argue that FB doesnt need to use probabilistic data structures for estimating the size of a small set of externally provided PII, but they probably need to keep them at hand in case they need to intersect with geo-demo sets.e.g one uploads a large list of emails, and wants to intersect that audience with the set of males in san francisco.