Is it possible to comply with GDPR while using this to store data? Given that it operates like an append-only log, is it possible to actually remove data to comply with a GDPR request?
You can use cryptoshredding: have an encryption key for each user (stored outside of this ledger) and encrypt all PII with that key. Throw away the key if the user wants you to delete their data.
But then you must also plan for what happens when that encryption is broken. So I think you also need to control and protect your storage in order to make that a safe strategy.
The more I think about these things, the more I distrust cloud providers, and want my own hardware.
Do you really trust these companies enough to hand them the keys to all your data? Is there really any way to provide secrets to your app without trusting the hosting provider?
If your keys leaked, you'd probably have to assume you lost all of the data up to that point. To secure the data going forward, you'd need to generate a second key per user for all of the future data. Well, and hopefully shore up the security problems!
I agree, though, that an immutable ledger like this complicates things in a way that you-shouldn't-mutate-but-can datastores do not.
I think it's worse than just losing the data. If you operate a public cryptography ledger with users data in EU and do it under some company name, you won't be able to comply with the "right to be forgotten" or how it's called.
I'm currently working on this problem in application to blockchains. The plan ATM is to implement cryptographic snapshots of the data, where the old transactions are erased but their proof is available.
By not storing that type of data. It's you need to store that type of data you can also anonymize or turn into keys where you keep the answers in a separate (mutable) database.
Also, for some purposes (legal) you are allowed to store the data regardless because you have to for other reasons.
Which is why you use it for specialised use cases and keep any PII out of there.
It would be possible to replay into a new ledger, filtering out the pieces of data to be deleted, but that goes against having an immutable log in the first place.
It's easy enough to store sensitive data externally (e.g., in a key value store) and simply store a reference to the data along with its hash in the ledger. When data needs to be removed, delete the data from your KV store and add an entry to the ledger noting that it was removed.
But you probably wouldn't store sensitive user data in this kind of database anyway. Not ever use case is well-suited for a ledger like this. In most applications, this would be pointless overhead.
There are two parts of every PII storing system. The actual PII store which is super small, "mutable" with your terminology, locked down so nobody can access it without raising an alarm and usually not accessed at all except for some very limited use cases, including GDPR ones. The rest of the store just uses references to the entities sitting in the GDPR store, like a numeric id (foreign key in SQL terminology). This way any data store, SQL, datalake, etc. can be easily GDPR compliant without needing to delete data in the large data stores and this also increases security because in case of a security breach to the data stores the GDPR data cannot be accessed.
If you tie a user to a uuid separately from where you are logging the transactions, you can nullify the existing UUID link to the given user and be in full compliance with GDPR.
GDPR "Right to Forget" has an exception clause that defers to regional Accounting compliance laws, such as retaining a credit card transaction for 5 years.