Hacker News new | ask | show | jobs
by im3w1l 1927 days ago
Possibly? It could leak data in a lot of ways. For instance you might see that a particular IP is accessing particular rows.

Or you might see that certain rows are accessed together. For instance if rows represent users, you could look at the "shape" of accesses and try to match that to the shape of a social network you know from elsewhere. As an unrealistic reduced example: Maybe you know that Alice is friends with Bob and Charlie and Dennis. Bob is friends with Charlie. Now you see that W has 3 friends X, Y, Z, of which X and Z are friends with each other. Then W might be Alice, the X and Z might be Bob and Charlie (but we don't know which is which), and the Y Dennis.

    . ABCD
    A  xxx
    B x x
    C xx
    D x

    . XYZW
    X   xx
    Y    x
    Z x  x
    W xxxx
1 comments

Fair. I suppose it really depends what the motivating use cases are for this kind of thing— in my head, the killer app would be a mail server where I decrypt (from GPG) my email in my local client, and then re-encrypt (with HE) and prepare indexing information which is sent to a cloud service for long-term archival. The cloud service allows me to search on and retrieve my email without ever seeing the contents or knowing any of the metadata other than when you uploaded them and the message lengths (and even that could be scrambled up a bunch by adding arbitrary padding and periodically re-uploading old messages). In this scenario there isn't really the "known from elsewhere" info— knowing that certain groups of messages are returned together in response to sets of queries is unlikely to be meaningful if you don't have anything else to map that info to.

And thinking about it further, the client could do other shenanigans to further protect the user, like arranging for every query to return 20% "false" results that are filtered out before display, or storing duplicate instances in the database so that the server thinks query X leads to result A and query Y leads to result B, without knowing the A and B are in fact the same thing.

> In this scenario there isn't really the "known from elsewhere" info— knowing that certain groups of messages are returned together in response to sets of queries is unlikely to be meaningful if you don't have anything else to map that info to.

If the service knows the upload date of the message (presumably same as send date), and does this for a lot of users to cross correlate, that's actually quite a bit to go on to figure out who you are talking to.