Encrypt your data; publish to public blockchain. Instead of ever giving out the data itself, you give out the right to do something with that data, in the form of signing a method call on a smart contract platform. This is basically the capability security model applied to data in the cloud. Capabilities have well-known patterns for things like revoking access over time (see eg. the Membrane Pattern).
To prevent cases where the "something" that the smart contract does is "copy all your data bit-for-bit and upload it to my evil masters", you could perhaps apply information entropy, on a platform level, to the source and output data, and only allow the transaction if the output data contains many fewer bits than the input. So say you have all of your location data and review history on the blockchain, encrypted with a private key known only to you, and you want to grant an application the ability to recommend nearby restaurants that you might like. You authorize the transaction, and expect that the contract will release ~1K to you (a restaurant name, description, menu, reviews, and geocode) and will increase the data stored within its own data ownership by 0 bytes. If it does something otherwise, it's broken the contract, and can be automatically penalized financially (because this is a blockchain, it inherently needs a cryptocurrency).
I've had an idea for something like this since hearing about Google's Federated Machine Learning research paper and reading the Ethereum spec, but have other more pressing projects right now and don't have time to implement it. If anyone feels it's interesting, feel free to steal - I'd still love to work on it at some point in the future, but there're still some holes in the idea (notably around the information theory & federated learning aspects) and another speculative research project isn't really what I'm looking for now.
To do this meaningfully on any dataset that changes (finance, health, etc) will cost too much to store on blockchain. It’s expensive enough trying to store it on s3 or even glacier. If the chain stores everything forever it will get too expensive too fast.
I think a more realistic (but way less money for speculators) is to store PKI on a blockchain, then encrypt any blob anywhere and sign. Send that signature to the smart contract and have them pull blob from non-blockchain store.
If it’s something that the owner has agency issues with (eg, calculating fico score) then register the hashes of the data with a blockchain.
No need to store the data on the chain unless you’re worried about it disappearing.
Yeah, this assumes the existence of something like FileCoin/Storj that stores the actual data off-chain, with metadata & access keys on-chain. The blockchain is used to validate the integrity of the data blob and to financially compensate the host(s) that are physically storing the data.
This isn't making a whole lot of sense to me. Capabilities seem to be about making sure that sandboxed software doesn't get unauthorized access; it presumes you have a trusted environment (whether software, OS, or hardware) to enforce the capabilities.
Meanwhile, blockchain computation is about getting useful work out of untrusted participants. It doesn't seem like a fit.
Also, how do you do any calculation at all without decrypting the data? Or if you're thinking homomorphic encyption, what does a blockchain have to do with it?
It's cool, but the older I get the less time I want to spend learning tricks like this, in the same way that I don't want to waste as much time learning the intricacies of new videogames. Technology can be liberating but only in proportion to the amount of time people can invest in it. If technological liberation is just about getting an asymmetrical advantage for oneself and not extending that to everyone else (without demanding that they become experts in this latest way of gaming the system) then it's trash.
To prevent cases where the "something" that the smart contract does is "copy all your data bit-for-bit and upload it to my evil masters", you could perhaps apply information entropy, on a platform level, to the source and output data, and only allow the transaction if the output data contains many fewer bits than the input. So say you have all of your location data and review history on the blockchain, encrypted with a private key known only to you, and you want to grant an application the ability to recommend nearby restaurants that you might like. You authorize the transaction, and expect that the contract will release ~1K to you (a restaurant name, description, menu, reviews, and geocode) and will increase the data stored within its own data ownership by 0 bytes. If it does something otherwise, it's broken the contract, and can be automatically penalized financially (because this is a blockchain, it inherently needs a cryptocurrency).
I've had an idea for something like this since hearing about Google's Federated Machine Learning research paper and reading the Ethereum spec, but have other more pressing projects right now and don't have time to implement it. If anyone feels it's interesting, feel free to steal - I'd still love to work on it at some point in the future, but there're still some holes in the idea (notably around the information theory & federated learning aspects) and another speculative research project isn't really what I'm looking for now.