Hacker News new | ask | show | jobs
by injinj 1875 days ago
I have a graph for this:

https://github.com/raitechnology/raikv/blob/master/graph/mt_...

The CPU in this case is a Threadripper 3970x, 32 cores, 64 SMT.

My experience is this: When the L3 cache is effective, then the memory latency hiding via memory prefetch works well across SMT threads. If the hashtable load requires a chain walk, the SMT latency hiding is less effective because the calculated prefetch location is not the actual hit. I couldn't get prefetching multiple slots as the load increased to be as effective as prefetching a single slot.