Hacker News new | ask | show | jobs
by lucacasonato 1441 days ago
A bit of a blunt statement on my part. There is monitoring on a multitude of other connection related issues (eg TLS handshake failures, missing SNI, etc). We should have had monitoring for this specific failure where the load balancer did not have any healthy backends, but as mentioned in the post, the load balancer was programmed in way that this should never have been able to happen in the first place (as the LB should have un-advertised itself if there are no unhealty backends).

We are capable of learning from past mistakes though, and as such we'll make sure to add more monitoring for these kinds of scenarios so we can be alerted to a root cause earlier. We will do better.

1 comments

API Monitoring is the practice of making calls to an API to check it. Live end-to-end tests. We do at least ping for every API in every region. Still hard to pinpoint these issues sometimes.