|
The problem with homomorphic encryption in this case is that you need access to the private key in order to interpret the results of the computation. Joe Public encrypts queries using a public key, and the query runs on a possibly compromised server without the attackers learning anything about the database or the query, but then the result of the query looks like absolute gibberish to Joe Public. Giving Joe Public access to the private key necessary for interpreting the query result allows attackers to inspect all of the intermediate states of the query finite state machine, which allows debugging and inspection just as if homomorphic encryption wasn't in use. I suppose the routing proxy could hold the private key and decrypt the query result for the general public. However, the location of the routing proxy is almost certainly going to be compromised before the locations of the servers executing the queries, so in the decrypting proxy scenario, the attackers will almost certainly have the secret keys before they get access to the boxes executing the queries. There's also the problem that the messages being decrypted are the final states of finite state machines that executed the queries, so the messages to be copied over the network add up in size to at least the size of the dataset being queried. (The data can be sharded into many smaller databases, and almost certainly would be in order to speed up the homomorphic computation steps, but this doesn't cut down on the amount of network traffic necessary to retrieve all search results for a single query. A simple query on 1 TB of data, split into 10,000 databases each of 100 MB would require copying and remotely decrypting 10,000 messages, each over 100 MB in size.) It might be possible to discover a homomorphic encryption scheme whereby knowledge of the private key allows one to devise a mapping from a higher dimensional finite state to a lower dimensional finite state machine, where the secret key for the smaller dimensional machine doesn't leak information about the secret key for the larger machine. In this case, it may be possible to perform some finishing operations on the query to prepare it for conversion to the smaller state machine and give the public the private key to the smaller state machine so that the query result could be read from the machine by the public without the public being able to observe intermediate states of the query computation. However, I believe this is far beyond our current mathematical understanding. |
This doesn't apply to TPB, but one could give each user of, say, an email webapp the private key to his/her own data while still facilitating server-side search.
> I suppose the routing proxy could hold the private key and decrypt the query result for the general public. However, the location of the routing proxy is almost certainly going to be compromised before the locations of the servers executing the queries.
This wouldn't be completely useless since it lets you offload much of the storage and computation onto commodity cloud providers without revealing what's on the machines, even if they're scanning your RAM. From the article it seems like TPB is getting some kind of utility out of such a scheme: "All virtual machines are hosted with commercial cloud hosting providers, who have no clue that The Pirate Bay is among their customers. All traffic goes through the load balancer, which masks what the other VMs are doing."
> There's also the problem that the messages being decrypted are the final states of finite state machines that executed the queries, so the messages to be copied over the network add up in size to at least the size of the dataset being queried.
In a homomorphic encryption scheme that supported querying, only the encrypted results would need to be relayed back from each search shard, no?