| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by jandrewrogers 378 days ago

> As an engineer, I find it neat that figuring out how to delete data is often a more complicated problem than figuring out how to create it.

Unfortunately, this is a deeply hard problem in theory. It is not as though it has not been thoroughly studied in computer science. When GDPR first came out I was actually doing core research on “delete-optimized” databases. It is a problem in other domains. Regulations don’t have the power to dictate mathematics.

I know of several examples in multiple countries where data deletion laws are flatly ignored by the government because it is literally impossible to comply even though they want to. Often this data supports a critical public good, so simply not collecting it would have adverse consequences to their citizens.

tl;dr: delete-optimized architectures are so profoundly pathological to query performance, and a lesser extent insert performance, that no one can use them for most practical applications. It is fundamental to the computer science of the problem. Denial of this reality leads to issues like the above where non-compliance is required because the law didn’t concern itself with the physics of computation.

If the database is too slow to load the data then it doesn’t matter how fast your deterministic hard deletion is because there is no data to delete in the system.

Any improvements in the situation are solving minor problems in narrow cases. The core theory problems are what they are. No amount of wishful thinking will change this situation.

2 comments

Gigachad 377 days ago

Instantaneous deletes might be impossible, but I really doubt that it’s physically impossible to eventually delete user data. If you soft delete first to hide user data, and then maybe it takes hours, weeks, months to eventually purge from all systems, that’s fine. Regulators aren’t expecting you to edit old backups, only that they eventually get cleared in reasonable time.

Seems that companies are capable of moving mountains when the task is tracking the user and bypassing privacy protections. But when the task is deleting the users data it’s “literally impossible”

link

alisonatwork 377 days ago

It would be interesting to hear more about your experience with systems where deletion has been deemed "literally impossible".

Every database I have come across in my career has a delete function. Often it is slow. In many places I worked, deleting or expiring data cost almost as much as or sometimes more than inserting it... but we still expired the data because that's a fundamental requirement of the system. So everything costs 2x, so what? The interesting thing is how to make it cost less than 2x.

link