Hacker News new | ask | show | jobs
by femto113 2255 days ago
The system doesn't need to ship every key to every phone, much more compact structures like Bloom filters could be used instead. If we assume about 1000 positives per day and each positive uploading 14 days of keys at 4 keys per hour that's a bit over 1 million keys per day. A Bloom filter with a false positive rate of 1/1000 could store that in about a megabyte. Phone downloads the filter each day and checks its observed keys, and only needs to download the actual keys if there's a potential match.
3 comments

The main issue of bloom filters is this:

> only needs to download the actual keys if there's a potential match.

One of the design constraints of the service was that it should not know your (suspected) infection status unless you give consent that it should be shared.

> Matches must stay local to the device and not be revealed to the Diagnosis Server.

https://covid19-static.cdn-apple.com/applications/covid19/cu...

The better the bloom filter is, the more likely it is that you have actually been in contact with a key if the bloom filter is positive.

Furthermore, the bloom filter has to deal with a lot more keys. In fact, in your example of 1000 positives per day uploading 14 days of keys you only need to upload 14 keys as they only rotate once per day. At 16 bytes per key (as the link above specifies), you'd have to download 14 * 1000 * 16 = 224kb, much less than the bloom filter needs. And this scheme can tell you with 100% certainty whether there has been a match or not, so at least in your example it's much better than bloom filters.

The scalability issues that exist only manifest themselves at larger numbers than 1000 infections per day, say upper tens to lower hundreds of thousands where it starts becoming a problem.

So yes, rough location as moxie suggests is the best method to improve the scheme. Instead of checking the IDs of people thousands or hundreds of km away from you, you could just check the IDs of people in your US state or county. But it has to be smart enough to recognize movement, as in, you need to upload/download all areas you've been in and people living at the borders automatically stand out because they download two or three areas.

I see lots of ways to mitigate requesting the keys from disclosing much information:

- could set the false positive rate higher than the chance of encountering a case in the wild (which makes it smaller)

- phone could be programmed to sometimes randomly request keys even when the filter doesn't match

- keys could be distributed across many static mirrors and your phone could pick one at random if the filter matched

Nothing prevents the user from pushing the checking to some trusted service as well, if they so choose. If they trust the service then they'd upload their seen keys to a checking service, rather than downloading the whole set of diagnosis keys. The important part is the decision is in their hands.
Bloom filters could work that direction as well: phone produces a filter of observed keys and uploads it to the service, service checks all positive keys to see if they're in the filter. I think the main point of doing the checking on the phone is that way you're the only one who knows if you've been exposed.
You need just one key per day, 15-minutes ids can be generated based on this. Bloom filter might be still useful though.