| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by rm999 1860 days ago

>Soft deletes

This section is totally wrong IMO. What is the alternative? "Hard" deleting records from a table is usually a bad idea (unless it is for legal reasons), especially if that table's primary key is a foreign key in another table - imagine deleting a user and then having no idea who made an order. Setting a deleted/inactive flag is by far the least of two evils.

>when multiplied across all the analytics queries that you’ll run, this exclusion quickly starts to become a serious drag

I disagree, modern analytics databases filter cheaply and easily. I have scaled data orgs 10-50x and never seen this become an issue. And if this is really an issue, you can remove these records in a transform layer before it hits your analytics team, e.g. in your data warehouse.

>soft deletes introduce yet another place where different users can make different assumptions

Again, you can transform these records out.

4 comments

inopinatus 1860 days ago

Most order forms are snapshots of data at the instant of their lodgement, since they are sales contracts. It is a rookie blunder to link them relationally to master data for products and PII &c.

The record of an order is not intrinsically PII and thereby subject to rights of erasure. It may well be equally unlawful in some jurisdictions to irrevocably destroy it entire, it being necessary for accounting or tax audit, or even simply for mundane followup process, such as returns, that arise from actionable consumer rights. Ergo, such documents must fundamentally survive the erasure/redaction of any PII it does include.

jhgb 1860 days ago

> It is a rookie blunder to link them relationally to master data for products and PII &c.

Is it always? If that data is immutable, for example?

snowwrestler 1860 days ago

How are you going to satisfy data compliance, which may require the deletion of PII upon request or expiration, if your PII data is immutable?

atatatat 1860 days ago

Was just considering this when I came across your comment.

I'm hoping someone here can suggest a one-way audit-log audit-trail sort of solution, because I need this for the medical industry.

ranguna 1859 days ago

I would say the structure of the records themselves can stay, but not the data itself.

If you have a user table, maybe you can just randomly hash the sensitive user data but keep the record.

Not 100% sure about this though, since you could probably derive the user with statistics like: if it's known that 1 person gets a specific disease every 10 years and you have an obfuscated record of a person connected with that disease, it's fairly straightforward to derive who that person is just through that connection.

jokull 1860 days ago

The way to do it is to have foreign keys, but instead of hard delete you scrub data in the columns.

snowwrestler 1859 days ago

Seems like if you can scrub data, it is mutable? Maybe I misunderstand what “immutable” means.

mvc 1860 days ago

How are you going to respond to a warranty claim if you've wrongly deleted the order data when a subject requests that you delete all PII you have about them?

https://www.adyen.com/blog/gdpr-what-it-means-for-customer-p...

snowwrestler 1859 days ago

You store the sales contract separately, which is what this whole subthread is about.

https://news.ycombinator.com/item?id=27251168

jhgb 1860 days ago

I'm not sure how this changes anything. Is your PII in the order forms/sales contracts mentioned above? If yes, you'll have to delete those as well anyway, right? If it's not, the order forms/sales contracts themselves don't have to be linked to something that may potentially get deleted.

Please note that by "immutable" I don't mean that data won't get deleted eventually, just that it won't be deleted until nothing needs it anymore (and until then won't be mutated either), so basically the same thing that languages like Haskell mean by "immutable". Then, once you don't need it (= it's not observable anymore), it could perhaps get archived or erased, whatever you prefer.

inopinatus 1860 days ago

Then it isn’t master data anymore; it’s just one field of a record of a commercial document. This is taking a long way around to the same point.

(Presumably no-one is trying to reference-count GC their RDBMS. If so, I wish them all the luck in the world.)

jhgb 1860 days ago

I'm not quite sure what "master data" means in English (a non-native language to me) but Wikipedia tells me that it's "data about the business entities that provide context for business transactions" (and lists examples that sound relevant for this situation to me). Based on that I'm inclined to think that this would qualify.

tgbugs 1860 days ago

Hard deletes are also awful from the perspective of data preservation. For example, when youtube removes a video they also delete all the metadata or any indication that it ever existed. Countless people have lost what they thought was a secure record of at least the title of songs or videos they saved to a playlist.

There is also a more sinister side, which is that the ability to hard delete something forever means that bad actors can fabricate old "deleted" documents and accuse someone of having created and then deleted them.

alexpetralia 1860 days ago

I do think that hard deletes may sometimes be required to comply with legal requirements (e.g. complete expungement of personal information relating to a user). If it is not required by statutory law, sometimes it is written into commercial contracts.

bulhi 1860 days ago

Exactly. I get OP's point (i.e. you can accidentally include softdeleted records in your results), but for some types of data hard deletes are an absolute no-go anyways, so you just have to live with it.

irrational 1860 days ago

> if that table's primary key is a foreign key in another table - imagine deleting a user and then having no idea who made an order

Assuming you have constraints set up correctly (on delete no action or on delete restrict) then how could this ever happen? If you don’t have constraints set up correctly…

vericiab 1860 days ago

Assuming you're deleting the row because it shouldn't be used by read queries, constraints like you described prevent the problem of having orphaned records in the child table but also prevent you from achieving your goal. On delete cascade would allow you to achieve your goal and prevent the orphaned records but could lead to deleting more than intended (especially if the child table is also a parent table referenced by further foreign key constraints, its children could in turn have children, etc). Of course, with no action/restrict you could also manually cascade the delete, but if you actually don't want to delete a child row and there's not an appropriate alternative value for its foreign key then you're in a bit of a pickle.

So if you want to delete a user but keep the records of their orders and still know who made those orders, then some form of soft delete is probably your best option. I believe that's the point rm999 was making (in response to the article asserting that soft deletes are a "data model mistake"). Properly configured constraints can prevent an "oops" but don't really do anything to solve the problem of this sort of delete from some contexts but not others.

andrewprock 1860 days ago

The chance that you don't have constraints set up correctly is indistinguishable from 100%.

dalyons 1860 days ago

Foreign key constraints are a waste of effort for most at-scale web apps. They guard against a subset of problems that aren’t actually problems (orphaned rows) at a putative cost (for the db)

irrational 1860 days ago

I disagree. Any fairly competent DBA will know how to setup the constraints correctly. It's not rocket science. If you can think logically enough to program, you can think logically enough to set up constraints correctly.

ako 1860 days ago

This should be the responsibility of the application developer creating the database schema and queries. A constraint is part of your application logic, not of the administration of the database.

yxhuvud 1860 days ago

I've yet to see anything I'd consider calling a startup having a DBA. I'm positively impressed if they even default to use foreign keys.

cerved 1860 days ago

Not everybody has a DBA :(

irrational 1860 days ago

Well, my team has never had an official DBA either. But we do all the tasks of a DBA. I've installed/upgraded, configured, tested backups, and hardened Oracle and Postgres more times than I can remember. We do all our own DML and DDL work. It really isn't hard to do it right.

quietbritishjim 1860 days ago

That would just make the data loss problem worse still. I realise OP just chose an arbitrary example, but if you really are talking about users and orders, and if you delete a user, then really deleting the records for their associated orders is even worse than losing track of who made them.

irrational 1860 days ago

I realize you might not be familiar with how database constraints work, but an on delete no action would totally prevent the user from being deleted. You literally cannot delete the user and their associated orders will definitely not be deleted.

stouset 1860 days ago

GP did not suggest ON DELETE CASCADE.

watermelon0 1860 days ago

Hard deletes most likely need to be supported, due to legal or contractual obligations. Designing with this in mind, makes everything a lot easier in the long run.

rm999 1860 days ago

I’ve always NULL’d values, not deleted rows. E.g. GDPR request? NULL out all identifying information, but keep the record.

As long as your primary key has no business meaning you should never have to delete the row of a table.

FigmentEngine 1860 days ago

INAL, but... you might want to revisit that code. article 17, right to erasure is about erasure of personal data, not about making non-indentifiable. of course they dont define erase or delete :-)

(edit: typo)

Tarq0n 1860 days ago

If you erase all identifying parts it stops meeting the definition of personal data. That should be sufficient.

Aperocky 1860 days ago

well to me the transaction is the same as deleting a record and populating a NULL record.

I don't see why the law should care in any way about a company populating NULL records.

izacus 1860 days ago

> I don't see why the law should care in any way about a company populating NULL records.

It cares if the existence of this record still leaks private data. This is why talking about generic "records" here is pretty wrong - actual data is not interchangeable "records" where you can just slap on a generic cargo cult policy and think you're done.

Different use-cases require different data handling. Although, I do agree, for most CRUD cases it's enough to NULL out rows.

FigmentEngine 1860 days ago

"NULL out all identifying information" is anonymization, not deleting the information.

duncans 1860 days ago

I wrote zeroes to my hard drive. Would you consider the data on my hard drive merely anonymized, or is it deleted?