| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by willvarfar 4750 days ago

Actually not to my thinking:

If program A hits swap, it means that cold pages are written to swap so that A can get those pages; this initial writing is done by program A, its true. But A may not be the cause of the problem, A is just the straw that breaks the camel's back.

And those pages that got written to swap likely belong to others, and they pay the cost when they need those pages back...

In my practical experience, when one of my apps hits swap, the whole system becomes distressed. It is not isolated to the 'offender'.

You can of course avoid swap, but with your OS doing overcommit on memory allocations, you are just inviting a completely different way of failing and that too is hard to manage. You end up having to know a lot about your deployment environment and ring-fence memory between components and manage their budgets. If you want to have both app code and cache on the same node - and that's a central tenet of groupcache - then you have to make sure everything is under-dimensioned because the needs of one cannot steal from the other; your cache isn't adaptive.

That's why I built a system to do caching centrally at the OS level.

I hope someone like Brad is browsing here and can make some kind of piecing observation I've missed.

2 comments

lmm 4750 days ago

I understand google solves that problem by not enabling swap on their servers.

link

gohrt 4750 days ago

That's rather common. If swapping can harm your application, than don't swap. On a machine where slowdown is tolerable (temporarily, on a desktop), swap is fine. On a machine whose entire purpose is to serve as a fast cache in front of slow storage, swapoff and fall back to shedding or queuing requests at the frontend.

link

mh- 4750 days ago

Without any specific knowledge of Google's practices, I can say this is certainly true - this is standard nowadays.

link

happyhappy 4750 days ago

That is my experience as well. In my thought experiment the 'offender' would be a server instance, not a process running among other applications on a single machine. Applications that hit swap often have memory leaks, and hitting swap is then just a matter of time. Creating a cascading failure may be preventable however.

link