Hacker News new | ask | show | jobs
by bcbrown 3806 days ago
I've worked at a late-stage startup that serves billions of requests per day at peak load, and there was no on-call for developers. I believe the ops team did have an on-call, but that was more at the infrastructure level, as everything was self-hosted at colos.

This was done by having a simple and resilient serving architecture. Every server was stateless. In addition, all the complicated logic was pre-computed into immutable lookup tables, offline. So if that task fails, it doesn't cause downtime, and can wait until the next workday.

We had a robust QA process, but it was far from stultifying.