|
|
|
|
|
by injinj
1875 days ago
|
|
I have a graph for this: https://github.com/raitechnology/raikv/blob/master/graph/mt_... The CPU in this case is a Threadripper 3970x, 32 cores, 64 SMT. My experience is this: When the L3 cache is effective, then the memory latency hiding via memory prefetch works well across SMT threads. If the hashtable load requires a chain walk, the SMT latency hiding is less effective because the calculated prefetch location is not the actual hit. I couldn't get prefetching multiple slots as the load increased to be as effective as prefetching a single slot. |
|