Hacker News new | ask | show | jobs
by bayindirh 670 days ago
Yeah, people tend to think server utilization as black and white.

Look, we're using just 50% of that RAM. Look, there're two cores that are almost idle.

No & No. Rest of the RAM is your secret for instant responses, and that spare CPU resource is for me to do system management without you notice or to front the odd torrent of requests we have semi regularly (e.g.: /. hug of death. Remember?).

1 comments

I need to find a really good intro to queuing theory to send people to. A full queue is a slow queue. You actually want to aim for about 65% utilization.
This might be too basic, but I found this blog post to be an incredible introduction to queues: https://encore.dev/blog/queueing
Also, there was a formula for determining the optimal cache size. I forget the name all the time. IIRC, in the end, caching most popular 10 items was enough to respond to 95% of your queries without hitting the disk.
If the numbers from the phoenix project are to be trusted, a loose estimate is the time spent in queue is proportional to the ratio of utilized to unutilized resources. For example, 50% used & 50% unused is 50:50 = 1 unit of time. 99% used is 99:1 = 99 units of time.