Hacker News new | ask | show | jobs
by jltsiren 928 days ago
I'm not sure what you mean by datacenter workloads.

The work I do could be called data science and data engineering. Outside some fairly trivial (or highly optimized) sequential processing, the CPU just isn't fast enough to saturate memory bandwidth. For anything more complex, the data you want to load is either in cache (and bandwidth doesn't matter) or it isn't (and you probably care more about latency).

2 comments

I had these two dual-18-core xeon web servers with seemingly identical hardware and software setup but one was doing 1100 req/s and the other 500-600.

After some digging, I've realized that one had 8x8GB ram modules and the slower one had 2x32GB.

I did some benchmarking then and found that it really depends on the workload. The www app was 50% slower. Memcache 400% slower. Blender 5% slower. File compression 20%. Most single-threaded tasks no difference.

The takeaway was that workloads want some bandwidth per core, and shoving more cores into servers doesn't increase performance once you hit memory bandwidth limits.

This seems very unlikely. The CPU is almost always bottlenecked by memory.
It's usually bottlenecked by memory latency, not bandwidth. People talk about bandwidth, because it's a simple number that keeps growing over time. Latency stays at ~100 ns, because DRAM is not getting any faster. Bandwidth can become a real constraint if your single-threaded code is processing more than a couple of gigabytes per second. But it usually takes a lot of micro-optimization to do anything meaningful at such speeds.