Hacker News new | ask | show | jobs
by oefrha 2411 days ago
> You just don’t know about it precisely because companies are normally not public and honest about it.

If a big company lost a ton of user data, I'd absolutely know about it, whether they have Apple-level secrecy or not.

2 comments

The incident described did not result in loss of tons of user data, and neither will most incidents, whether you choose to be open about them or not.
What are you talking about?

> This incident caused the GitLab.com service to be unavailable for many hours. We also lost some production data that we were eventually unable to recover. Specifically, we lost modifications to database data such as projects, comments, user accounts, issues and snippets, that took place between 17:20 and 00:00 UTC on January 31. Our best estimate is that it affected roughly 5,000 projects, 5,000 comments and 700 new user accounts.

https://about.gitlab.com/blog/2017/02/10/postmortem-of-datab...

Yes, most incidents from most companies don’t result in this kind of data loss, which is why GitLab stood out.

How do you know what most incidents result in? For example, when Github deleted their production database[1], they simply gave no numbers of affected users/repositories. We do know that the platform already had over 1M repositories[2], so 5000 affected seems perfectly possible, but their lack of transparency protected them against such claims. And that lack of transparency seems to me to be the norm.

[1] https://github.blog/2010-11-15-today-s-outage/

[2] https://github.blog/2010-07-25-one-million-repositories/

MySpace lost all its music from 2003 to 2015: https://news.ycombinator.com/item?id=19417640

Probably a few hundred TB or so. Maybe nearly a petabyte?

That’s the point: we know about that. Hard to believe “this happens everywhere” when we only know a few instances, and any instance would be picked up by media.
I've had to help clean up after any number of data losses or near losses that has never been made public; ranging from someone mkfs'ing the wrong device on a production server, to truncating the wrong table. In some cases afterwards having people writing awful scripts to munge log files (that were never intended for that purpose) to reconstruct data that were too recent for the last backup.

Of course there are people that avoid this, but I've seen very few places where their processes are sufficient to fully protect against it - a lot of people get by more on luck that proper planning. Often these incidents are down to cold hard risk calculations and people know they're taking risks with customer data and have deemed them acceptable.