|
|
|
|
|
by wolf550e
2850 days ago
|
|
Not every workload is memory bandwidth bound like his "make -j16" compile. Some workloads need memory latency or fast inter-core (and inter-socket) operations (e.g. RDBMS OLTP), some need CPU throughput (e.g. HPC), some need best possible single thread CPU performance (e.g. some gaming). As he wrote, CPUs are most efficient (compute per Watt) at a specific frequency, and if his CPU mostly waits for RAM, this can be done at low power. It's probably possible to create x86-64 CPUs with narrower backends (fewer execution units) with microcode-emulated 128 and 256 bit registers/operations (and maybe even emulated FPU) and get a cheaper and faster build server, if it was economical to fab such narrow-use-case chips (those would be good for redis/memcached too I imagine). |
|
He actually did `make -j32`, not 16. Which is going to absolutely devastate the cache.
`make -j<number of cores x 2>` was a good rule of thumb back when you had 1/2/4 physical CPUs with their own sockets on a motherboard and spinning rust hard disks. A lot of "compilation" time was reading the source code off the disk. But it doesn't make any sense anymore with so many cores, hyperthreading, and SSDs that serve you the file in milliseconds.
If he's bandwidth limited, he would gain a significant performance improvement by reducing the number of processes.