Hacker News new | ask | show | jobs
by huslage 2609 days ago
"Given the nature of this update and the need to ensure the highest level of security, we have provided limited advanced notice. We understand that the maintenance process impacts our customers, and we apologize in advance for any inconvenience this may cause." -- This is a ridiculous statement. Security does not come from opaque statements made at the last moment.

I fail to understand how updating a "cloud native" service requires downtime at all, much less this sort of outage.

3 comments

Translation: In the aftermath of the hack last week they found additional major security issues that require downtime to fix. They are limiting the advanced notice to give attackers less time to try to find them before they can be fixed.
Quoting from the page:

"Q: Is Docker experiencing a security incident?

A: No, this is a scheduled update and a proactive step we are taking to provide the best possible customer experience and highest level of security."

scheduled when...
We just scheduled it right now.
I don't think they're saying it's more secure because they didn't tell anyone.

In a prior job, we discovered some automation left ports open that could theoretically be accessed by customers, thus a malicious actor could theoretically have loaded software onto a privileged instance. So we had to treat all devices as potentially compromised. That meant running automation to tear them down and rebuild. This took days.

We were able to do this without visible outage, but the devices affected were mostly doing NAT; if you have a more central component affected, you do need downtime to back it up (that's the read-only section, don't let customers write data that doesn't get backed up) and then rebuild and put it all back together.

Also, it's just easier to get it right with downtime. And they're probably padding these estimates to handle the inevitable things that go wrong during maintenance.

>I fail to understand how updating a "cloud native" service requires downtime at all, much less this sort of outage.

Probably something to do with DB consistency.

Migration possibly that’s locking auth tables?