Hacker News new | ask | show | jobs
by tetha 836 days ago
Curiously, this is a big facet in our dev/ops re-organization.

For example, we in infra-operations are responsible to store data customers upload into your systems. This data has to be considered not reproducible, especially if it's older than a few days. If we lose it, we lose it for good and then people are disappointed and turn angry.

As such, large scale data wipes are handled very carefully with manual approvals from several different teams. The full deletion of a customer goes through us, account management, contract and us again. And this is fine. Even with the GDPR and such, it is entirely fine that deleting a customer takes 1-2 weeks. Especially because the process has caught errors in other internal processes, and errors in our customers processes. Suddenly you're the hero vendor if the customer goes "Oh fuck, noooooo".

On the other hand, stateless code updates without persistence changes are supposed to be able to move as fast as the build server gives. If it goes wrong, just deploy a fix with the next build or roll back. And sure you can construct situations in which code changes cause big, persistent, stateful issues, but these tend to be rare with a decent dev-team.

We as infra-ops and central services need to be robust and reliable and are fine shedding speed (outside of standard requests) for this. A dev-team with a good understanding of stateful and stateless changes should totally be able to run into a wall at full speed since they can stand back up just as quickly. We're easily looking at hours of backup restore for hosed databases. And no there is no way to speed it up without hardware changes.