| HN Mirror

Actually since you have no idea about what is causing the load (it can be wait on network IO) this is why I think that running your production system in that shape is not recommended. Out of curiosity in what situation is it ok to have significantly more things waiting to be running than your actual capacity? Seems like a bad capacity planning to me. Anyways, this is how it was done (keeping the normalized load around 1) in my previous gig where we had ~5000 nodes and it was working fine. I work on Hadoop clusters nowadays and any time we run into a load of 100+ there is a severe degradation in the service, timeouts etc happen. In reality high normalized load over time (not talking about 1 minute spikes) should be avoided, this is based on my experience.