Hacker News new | ask | show | jobs
by api 1479 days ago
Is this actually possible? Couldn't you make many repeated queries and slowly decrypt the text by e.g. slowly narrowing the range?
3 comments

This is possible. The goal is that the server knows as little as possible, while the client has full information. It's order revealing encryption. The server side knows the ordering of the values, but doesn't know any specific value. When queried, it is always getting prefixes (or exact matches) following the same encryption scheme, so it can compare those to the corpus and select results since the query parameters fall into the same ordering. The server doesn't have access to the keys needed to generate query parameters, so in theory it would be difficult for the server to perform narrowing queries on its own. Over time the server could gather statistical results that may reveal more about the data it's holding. Also, these schemes may need to produce the same cipher text for the same input, so frequency distributions can be used to reveal information.
Yeah the article is very thin on technical details. To make this work as they describe, it must not be possible for any client to "forge" queries, or else they could trivially decode the content by sending prefix queries of increasing length.

It's also difficult to see how this could work on the server side without exposing some information about the encrypted fields. For example, if all documents have a value that begins with "a", then there must exist a prefix query that matches all those documents. I would expect it to be possible to figure out whether such a query is possible or not, only given access to the encrypted data, but even if that's not possible, the simple fact that a prefix query was issued that matched all documents gives away that information.

You could have a larger range than domain and throw in some noise. Exact match queries would need to become range queries that are de-noised at decryption.
Yes. This is the fundamental problem with this.

For something like, HIPAA, this ads very little value if fields are semi-known.