Hacker News new | ask | show | jobs
by Triv888 1961 days ago
> I wouldn't be surprised if Google didn't have great records of deleted content.

It would surprise me if Google didn't keep everything. Harvesting data is part of their business model.

2 comments

>Soft deletion implies that once data is marked as such, it is destroyed after a reasonable delay. The length of the delay depends upon an organization’s policies and applicable laws, available storage resources and cost, and product pricing and market positioning, especially in cases involving much short-lived data. Common choices of soft deletion delays are 15, 30, 45, or 60 days.

https://sre.google/sre-book/data-integrity/#first-layer-soft...

I work at Google, not on anything related to this though.

Then consider yourself surprised. Deleted data is actually deleted. Keeping it around is a huge liability.
I don't know how things work these days, but a few years ago, Google's GFS didn't support deletion. The "delete" flag only meant "don't replicate this data" and it was just simpler for them to keep it around until the disk died.

Source: This was published by Google in their research papers on GFS. Sorry, can't remember which paper.

Without reading the paper, I'm about 99% positive it also would mark the data as unused, which would allow the disk space to be reclaimed and overridden. Disks aren't write once.

It also likely would reindex it, meaning that you can't find it if you go looking for it unless you happen to know which disk it's in already and it hasn't been overwritten yet.

So basically recycle bin rather than shredder
But said recycle bin is actually automatically emptied periodically.
GFS is ancient obsolete technology that is no longer used (the whitepaper was published almost two decades ago, and it described a system already built and in use!). Also, I don't think you're interpreting it correctly either.
Under GDPR I'm fairly certain that they've implemented actual deletion. GDPR even requires companies to go so far as to scan old tapes to delete user records on request by a 30-day deadline, iirc.
Good SRE practices suggest keeping data of certain sorts around for awhile before the deletion is fully completed.