| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by feffe 1872 days ago
	The memory latency hiding also works with 2-way SMT. I worked on a networking software doing per packet session lookup in large hash tables. SMT with a Sandybridge core in this application gave 40% better performance which is higher than usually mentioned. So for memory bound (as in cache misses) applications, SMT is a boon.

2 comments

injinj 1872 days ago

I have a graph for this:

https://github.com/raitechnology/raikv/blob/master/graph/mt_...

The CPU in this case is a Threadripper 3970x, 32 cores, 64 SMT.

My experience is this: When the L3 cache is effective, then the memory latency hiding via memory prefetch works well across SMT threads. If the hashtable load requires a chain walk, the SMT latency hiding is less effective because the calculated prefetch location is not the actual hit. I couldn't get prefetching multiple slots as the load increased to be as effective as prefetching a single slot.

magicalhippo 1872 days ago

I tested this some years ago on a raytracer, and got a tad over 50% more speed when enabling HT compared to disabling it.

As you say, the ray tracer did a lot of cache missing , interspersed with a fair bit of calculations. I'm guessing this is close to the ideal workload, as far as non-synthetic benchmarks go.