Hacker News new | ask | show | jobs
by ses1984 2119 days ago
>"Have you tested how the system behaves when the underlying instances have a sustained CPU spike?"

You didn't really address this question, you addressed a different question, which is a traffic spike.

>Also, I'd say my customers can endure it if page load time goes up from 100ms to 200ms. So in my opinion, there is little need for auto-scaling for most companies.

100ms to 200ms average? What about the tail? Your app might go from P99 - 500ms to P95 - timeout. That's when you'll lose customers.

1 comments

If the underlying hardware is a bare metal server, it won't magically turn slow and have a CPU spike. That problem is caused noisy neighbor and kind of exclusive to clouds.

Well, with the 2x example, my app might get from a 1s P99 to a 2s P99 which feels slow, but is still doable. Again, those timeouts are usually introduced by cloud infrastructure. For example, if you use nginx outside of Heroku, it won't have a 30s timeout for file downloads.

Your own instances can have an unexpected CPU spike.

Even if you're running on bare metal I find it hard to believe you don't have a layer with short timeouts between your front and backend.

Why would I? I have redundant 1GBit LAN cables between front end, back end, and database servers.
Because it's bad ux for your users to see a spinning loading icon forever.
And a timeout error would be better?
In my experience, yes, a lot better.