Hacker News new | ask | show | jobs
by MauranKilom 1813 days ago
Fun tangential anecdote regarding how interconnected and unintuitive CPU performance can be: I once made something run 20% faster by spawning a thread that did nothing but spin (i.e. while (true);).

I was trying to optimize some FEM code, toying with (hardcoded) solver parameters. On one console I had it spitting out the wall clock durations of time steps as the simulation was running, while on the other I was preparing the next run. I start compiling another version, and inexplicably the simulation in the other console gets faster. Like, 10%-20% less time taken per time step. "That must have been coincidence. There's no way the simulation got faster by compiling something in parallel." But curiosity got the better of me and I still investigated.

Watching the CPU speed with CPU-Z, it turned out that the simulation was indeed getting down-clocked, and that compiling something in parallel made the CPU run faster, speeding up the simulation too. WTF? And indeed, I could make the entire simulation run significantly faster by calling

    std::thread([](){ while (true); });
at the start of main.

Why? Well, the simulation happens to be extremely memory-bound (sparse mat-vec multiplication in inner loop). So the CPU is mostly waiting around for data to arrive. Apparently the CPU downclocks as a result. That would be fine, if not for the fact that the uncore/memory subsystem clock speed is directly tied to the current CPU speed. That's right: The program was memory-bound, hence the CPU clocked down, hence the uncore clocked down, hence memory accesses became slower.

Knowing that feedback loop, it makes perfect sense that keeping the CPU busy with a spinning thread improves performance. But it's still one big wtf.

This problem eventually went away as we parallelized more and more of the simulation, giving the CPU less reason to clock down. But for related reasons, the simulation still runs faster if you prevent hyperthreading (either by disabling it in BIOS or having num threads = num hardware cores). More threads don't improve memory bandwidth and the hyperthread pairs just step on each others toes.

1 comments

I'm confused, how is COU speed tied to memory? AFAIK, memory is tied to CPU base speed which is almost always 100 MHz. The CPU then just scales it's own multiplier.
Northbridge frequency (as shown by CPU-Z) is correlated to CPU speed in my experiments. It's not one to one, but NB frequency definitely varies by a factor of two depending on CPU load.

What exact mechanism controls this is not clear to me (and I'm actually not sure if it's clear to anyone outside of Intel - the one paper [0] I found at the time was based on reverse engineering experiments). Nevertheless, CPU clock speed definitely affects Northbridge speed, as proven by the latter increasing from spinning a thread that never touches memory.

[0]: https://tu-dresden.de/zih/forschung/ressourcen/dateien/proje...

See section V.A:

> The results [...] indicate that uncore frequencies – in addition to EPB and stall cycles – depend on the core frequency of the fastest active core on the system.

(That conclusion is fully in line with my own observations.)

Also see the corresponding patent linked in the paper: https://patents.google.com/patent/WO2013137862A1/en