Hacker News new | ask | show | jobs
by gizmo 2341 days ago
They don't say they're sorry, because they're not. Instead they minimize their actions by: 1) stating how few customers customers were affected, 2) how it's not really their fault because it was a hardware error, 3) it's not really their fault because they had already planned to upgrade the server, 4) it's not really their fault the restore procedure took so long because they had to make backups first, 5) the restore took so long because spinning disks are slow, and they really had no way to know this in advance. And to top it all off they point out they're not contractually obligated to provide working snapshots at all, so really it's the customers who are at fault here.

The take-away here is clear: don't trust Gandi with anything you care about.

4 comments

No data was lost though, true?

I don't know if I expect a postmortem to say "sorry", and I think you are being needlessly harsh. But I agree this level of service doesn't seem up to current best in class. Like Amazon etc. (Which of course still have unexpeted outages very occasionally, although a 5 day time to recovery would certainly be... unusual).

But this partially shows how much expectations/standards have raised in the past few-10 years. When an unacceptable not up to par level of reliably still involves no data loss, we're doing pretty good. And I think "don't trust Gandi with anything you care about" is probably an exagerated response. But yes, they don't seem to be providing mega-cloud-service-provider level of service.

Honestly, I too was looking for a straightforward 'We screwed up, sorry.' I wouldn't care nearly as much if they'd just had 5 days without snapshots. But the way they poorly handled support deserves to be addressed in a postmortem.

See this thread for the support at the time:

https://twitter.com/andreaganduglia/status/12152827193300664...

> Honestly, I too was looking for a straightforward 'We screwed up, sorry.'

They do so a bit here: https://news.gandi.net/en/2020/01/major-incident-on-our-host...

>We’re very sorry for this truly unfortunate incident and we offer our sincere apologies to anyone impacted.

Fair enough. That probably is bad marketing if nothing else, and maybe something else. what you're saying about the major failure being in support/customer-management, even more than the technical issue, seems potentially reasonable. (I am not a Gandi customer, so it's not personal for me).

I still think the fact that there was no data loss, and we're still on the edge of calling it unacceptable incompetence, is worth noting, as to how far our expectations and standards have come. Which is good of course.

The linked twitter thread explicitly mentions data loss:

> Hi Andrea. It is confirmed we have lost data and we are terribly sorry for that. However, please note that what happenend[sp] could happen to any web host.

Customers that were forced to migrate to a different webhost had to restore from whatever backups they had, and they lost data for sure. Even if Gandi ultimately recovered everything (and it's not completely clear if they did) at that point the customer data/databases have already been forked so it's too late.

OK, that makes it even worse then. The postmortem linked above definitely says:

> We managed to restore the data and bring services back online the morning of January 13.

Is that wrong? It's bad to lose data, it's even worse to tell people you didn't lose data in one place when you did, and tell them you did lose data in another.

There was confusion with this. Originally they thought they lost all data, which is why a lot of people went crazy at them via Twitter, they later said there now might be a chance to recover data - luckily they ended up finding a way to recover it.
> No data was lost though, true?

What about any data that would have accumulated in those 5 days? This was storage for their IAAS and PAAS products, so anyone using those lost access for 5 days?

Well, it's a technical post-portem, not a love letter. I'm not affiliated to Gandi in any way, but I find the finger-pointing a bit too pedantic.

> The take-away here is clear: don't trust Gandi with anything you care about.

The take-away is not this one. Its: backup anything you care about.

They called snapshots backups in their web interface when viewing snapshots. From their docs:

> "Snapshots allow you to create a backup copy of a volume"

https://pbs.twimg.com/media/EN2UZ6TX4AAMe-H?format=png&name=...

They are doing a lot of preaching about backups when failing to do internal backups (not customer facing backups) of their own products.

As it's said in the postmortem, they are agreeing that they should have stated in a more obvious way that the backups availability was not contractually assured.
The postmortem says "we don’t provide a backup product for customers" while the docs describe the snapshots as a backup (see screenshot from my higher level comment). This is the disconnect for me that I'm sure is causing a lot of the frustration they are hearing from customers. They are not accepting that they sold the snapshots as a backup and this is disappointing in a postmortem where users are looking for empathy, acknowledgement, and a path forward.
I for one am glad they released a factual account and timeline of what went wrong. I don't see it as an attempt to minimize their actions. They even admit that they have no clear explanation of the original issue, when they could easily have committed to a stronger theory to make themselves look more competent. Overall I'd much rather read this than a massaged PR apology that keeps us in the dark of what actually happened.
This is the messaged PR “postmortem”. It’s basically a shoulder shrug emoji and takes zero responsibility for the incident.

They also failed to address their abysmal responses on Twitter that essentially belittled and poked fun at the affected users.

E.g. https://news.ycombinator.com/item?id=22002258

>They don't say they're sorry,

>We’re very sorry for this truly unfortunate incident and we offer our sincere apologies to anyone impacted.

https://news.gandi.net/en/2020/01/major-incident-on-our-host... (linked in the Postmortem)