| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by PaulKeeble 188 days ago

You run these tools and you find all the maximums for weeks of traffic and so you set them down to minimise cost and all is well until the event. The event doesn't really matter, it causes an increase in traffic processing time and suddenly every service needs more memory to hold all transactions, now instead they fail with out of memory and disappear and suddenly all your pods are in restart loops unable to cope and you have an outage.

The company wasting 20% extra memory on the other hand is still selling and copes with the slower transaction speed just fine.

Not sure over provisioning memory is really just waste when we have dynamic memory based languages, which is all modern languages not in real time safety critical environments.

1 comments

esseph 188 days ago

This acts like one app is the only app running on device, which in the case of k8s, clearly isn't the case.

If you want to get scheduled on a node for execution after a node failure, your resource requests need to fit / pack somewhere.

The more accurately modeled application limits are, the better cluster packing gets.

This impacts cost.

link