Hacker News new | ask | show | jobs
by nwrk 3640 days ago
Don't like the attitude. Pointing fingers doesn't help paying customers trapped by Layer poor design choices.
3 comments

Especially when they seem to be referencing only a single region. Multi-region deployments is the most basic protection against outages when using IaaS.
They should at least be in multiple availability zones. Multiple regions often comes with a lot of challenges, but there isn't much reason not to be redundant in multiple AZs.
> Multiple regions often comes with a lot of challenges

As an almost-customer of Layer (before their massive price increase), they led me to believe that this was one of the problems they would be solving for me. Nowhere on their website does it say, "We save money by not following best practices, so plan accordingly for occasional outages!"

Were they or were they not in multiple AZs? Developing for multiple availability zones is trivial when creating cloud-first software (and it's irresponsible not to use AZs!), multi-region comes with its own set of problems.
In the report they say they're looking at moving to a new region, but Google apparently told them that us-central1-a was down. The "-a" makes it an AZ. It sounds like they're only on one AZ and may not fully understand the difference.

[correction: they accidentally called usc1-a a region, but everything mentioned in their outage was a zone. They specifically called it a "deployment zone" not an availability zone, so it sounds like an issue of inexperience with best practices.]

[obligatory disclaimer: I'm a Google employee. I don't have a relationship with Layer]

Yup. Just as too few companies realize that cloud doesn't mean one remote box replacing one local box, all eggs in one regional basket (or even single cloud provider) is unwise.
Blake from Layer here: I've reviewed the updates from last night and I don't feel like the tone was out of line. We were simply trying to provide our customers with complete transparency about where the issue was and where we were in restoring service.

With that said, we do feel that Google came up in short in their responses to us over the course of the issue. We pay handsomely on a support contract to get off-hours responses and issue escalations. The responses we received were hand-wavy and vague, leaving us without sufficient data to make decisions. We have raised these concerns with our Google representative and will be working with them to tighten our partnership going forward.

We take full responsibility for this event and are working to cover the exposure. Building a system and business with resource constraints and complex distributed technologies is a long game of managing risk and trade-offs. We're human and we make bad calls along the way. We are very sorry and violated our commitments to our customers and their users. The entire Layer engineering team is head down right now working to make it right.

That was my first thought as well. Why did they need to start migrating customers to another AZ? I hope their customer's started asking that as well. The title should be "Poor Design Choices Brings Down Layer"