| I believe we just have a rather different approach here. "Have you tested how the system behaves when the underlying instances have a sustained CPU spike?" Since dedicated boxes are cheap, I'd just buy 5x the CPU resources that I reasonably need and call it a day. If there ever is a more than 5x traffic spike, then docker will prevent it from being a noisy neighbor, so the affected services will just become slower than usual. But even a 10x traffic multiplier would just produce a 2x slowdown, which should be tolerable for most users. I agree that on clouds you want to save costs by only booking what you need. But bare metal, you can usually afford to keep spare capacity around all the time. As such, I wouldn't plan for the system to behave well under stress. I'd try to always have enough resources around so that stress never happens. At the end of the day, this seems like a developer time vs. resource costs trade-off and for most companies, developers are sparse and resources are plentiful, so they'll have a very different trade-off from big FAANG companies. "For example, is auto-scaling set up, and does it behave as expected?" If your system is usually 90% idle, I wonder if you'll ever need that auto-scaling. Also, I'd say my customers can endure it if page load time goes up from 100ms to 200ms. So in my opinion, there is little need for auto-scaling for most companies. |
You didn't really address this question, you addressed a different question, which is a traffic spike.
>Also, I'd say my customers can endure it if page load time goes up from 100ms to 200ms. So in my opinion, there is little need for auto-scaling for most companies.
100ms to 200ms average? What about the tail? Your app might go from P99 - 500ms to P95 - timeout. That's when you'll lose customers.