Hacker News new | ask | show | jobs
by digitalneal 2678 days ago
Pardon my ignorance here, never worked on this kinda program before, but isn't is actually hard on complex databases like Facebook to actually remove data, better just to mark it as unviewable?
4 comments

It's not just complex databases, but any database that uses indexes (basically any production database).

Deleting a record from a table can cause a re-index, which is very intensive. It's much easier to flag a column in a record as "deleted" or whatever, and then run a cleanup during off-peak hours.

I'm sure there are clever ways around that with proper knowledgeable DBAs on your team, but as I'm a web dev for smaller audience projects, I don't touch solutions that require those types of optimizations that I'm sure Facebook has implemented.

》Deleting a record from a table can cause a re-index

As will adding new data and FB loves the story of adding to their profiling DB.

This is not an indexing issue but a "we love adding data but hate removing it" issue.

Long story short, no. If that data were a liability instead of an asset, it would be done yesterday.
Why? Google has the option for many years, it is hard but definitely doable.
I'm sure there's some technical reason they could come up with, but to me, that's admitting they didn't design the database with that use-case in mind.

Disclaimer: Have no experience with databases at that scale so maybe this isn't entirely unreasonable.

The HN database must be very complex then, you can't do either.