| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by athenot 2590 days ago

> As someone who has worked on large scale saas, I say: there is zero, 0, ZERO, 0.00 chance of that data every actually being deleted

Unless that is built by design. I happen to also work on a large scale SaaS where we take this stuff very seriously and I can say it is possible to protect this data. However I will agree that this adds considerable complexity, but for some organizations, that is totally worth it.

> need to retain data to fulfill government requests

That's a choice, not a requirement. If you encrypt the data and purposely don't store the keys yourself but instead have the customer store them, then you don't have anything of value for the government.

> internal auditing

Personally Identifiable Info is not something we want to peruse. In fact we purposely don't want to see it because that eliminates a potential for mishandling.

> it's all backed up in some "data lake" somewhere to do internal ml or analytics on

That kind of application shouldn't give carte blanche to disregard retention policies. You can run those applications against replicated shards of the original data; and when the original gets reclaimed, so does the replica.

> hundreds of copies in database backups from different times

Storing useless data forever is not cheap, especially at scale. Better store what needs to be stored and free up what can be freed when retention policies kick in (or user requests it).

> internal logs that contain the data

That's ground for failing certain compliance audits. Logs should never contain PII in the first place, that's an operational failure.

> it's already been analyzed and aggregated into learning products and models that aren't going to be recomputed

That's a tricky one, but if those are actual models instead of giant lookup tables, one could assume the data is not reconstructible. However, that needs to be a design consideration of the models themselves, to prevent user data from persisting.