Hacker News new | ask | show | jobs
by watermelon0 1860 days ago
Hard deletes most likely need to be supported, due to legal or contractual obligations. Designing with this in mind, makes everything a lot easier in the long run.
1 comments

I’ve always NULL’d values, not deleted rows. E.g. GDPR request? NULL out all identifying information, but keep the record.

As long as your primary key has no business meaning you should never have to delete the row of a table.

INAL, but... you might want to revisit that code. article 17, right to erasure is about erasure of personal data, not about making non-indentifiable. of course they dont define erase or delete :-)

(edit: typo)

If you erase all identifying parts it stops meeting the definition of personal data. That should be sufficient.
well to me the transaction is the same as deleting a record and populating a NULL record.

I don't see why the law should care in any way about a company populating NULL records.

> I don't see why the law should care in any way about a company populating NULL records.

It cares if the existence of this record still leaks private data. This is why talking about generic "records" here is pretty wrong - actual data is not interchangeable "records" where you can just slap on a generic cargo cult policy and think you're done.

Different use-cases require different data handling. Although, I do agree, for most CRUD cases it's enough to NULL out rows.

"NULL out all identifying information" is anonymization, not deleting the information.
I wrote zeroes to my hard drive. Would you consider the data on my hard drive merely anonymized, or is it deleted?
It's deleted because all connections (and meta data) has been erased.

In a database, if you null all fields but keep the entries and their inter table relationship intact, you still have identifiable data up to a certain point.

Imagine you have data of specific people and you only have one person per country, having a user and country relationship, if you null a single data of a specific user, you'll still be able to find the user just by analysing to which country that user was connected + some external information.

This is the classical problem of deriving personal data from statistical reports and why data anonymisatiom is so complex.