Hacker News new | ask | show | jobs
by marta_morena_24 2211 days ago
Why would you have to process "all" the rows equally? It should be more than sufficient for almost any practical scenario to use some constant factor of "scattering", i.e. access 100 random rows + 1 row that you actually are interested in.

It will probably be more complicated than that, due to repeated queries revealing patterns, but there are other mitigations for that, some of which are sure being developed.

2 comments

You could trade off some security for speed like that, but 100 rows wouldn’t cut it. Ask for the same information again and you’ll probably only have 1 intersection. Even if you read half of the rows unnecessarily, you could figure out the true row in between 2 and approximately log2(rows) reads. Just isn’t worth the side channel attack.
If it's the same query, it would be the same 100 random rows, wouldn't it? Then repeated queries wouldn't leak.
No, if you run the same query the server must not be able to determine that it was the same query. Repeated invocations of the same question would be transformed with added noise during encryption so that the encrypted query and answer is different each time.

If the same query would access the same 100 random rows, then after seeing your (hidden) query and getting the 100-random-row signature it would be trivial to run a bunch of candidates and see which one of them gets the same 100-random-row signature and thus figure out what you queried.

If you omit half the rows, you may already have gained a lot of information about the retrieval purpose.

Retrieval purposes do not scale by the number of rows needed to be retrieved.

You also run into the problem that different retrieval purposes require different number of rows to be retrieved.