| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by undfg 1810 days ago
	Interesting. Does this mean that if you are not going to use all HT threads it's better to turn off HT?

3 comments

jrockway 1810 days ago

SMT is usually a throughput win, and usually a latency loss. I was playing with my Threadripper a while back and for C++ builds of large projects, HT results in about a 10% improvement in compilation speed. 10% is a big deal and you should take it. The downside is that games had noticeably lower framerates even with the rest of the CPU idle (at least the games I play are bounded by single-thread performance across maybe 2-4 cores). I kind of blame Windows's scheduler there, since it should be able to say "hey, a game is running, don't schedule anything on the same physical core that has a game thread running on it", but I don't think it does. It might schedule both game thraeds on the same physical core and then they contend with each other and run 40% slower each. Also be careful about memory -- 64GB wasn't enough for 64 concurrent clang runs. You need a little bit more (but of course 128GB is the next installable increment). (I can also see more threads aggravating other resource constraints; notably disk IOPS, but I didn't notice a problem there myself. It's also possible that SMT increases power use and so decreases turbo speeds, and that might have an impact. I didn't measure that when I was testing.)

For me, I keep SMT turned off. The latency is more important than the throughput for my workstation, but if you do full builds of C++ regularly, you might want it on. Use the 10% time you're getting back to switch to a build system that can cache things, though.

link

toast0 1810 days ago

> 64GB wasn't enough for 64 concurrent clang runs. You need a little bit more (but of course 128GB is the next installable increment).

That's not really true, you can mix and match memory, and you might not get ideal bandwidth, but for a lot of uses it's just fine.

link

namibj 1810 days ago

Notably, triple-rank configurations work just fine. It's just not trivial to find both a dual-rank module and a single-rank module with the same components used on both (so all the sub-timings match and performance will be nice and proper).

link

bserge 1810 days ago

FWIW, you can set core affinity for your game process.

link

wtallis 1810 days ago

Depends on the machine. The earliest implementations of HT worked by statically partitioning various caches and other resources in the processor core in half, which meant that a single-threaded process really could slow down by having HT enabled but not actively used. Newer desktop-class processors tend to have no significant downsides to leaving HT enabled, but there might still be some SMT implementations on niche products that don't handle this well.

link

temac 1810 days ago

There still are (and probably will always be) some workloads where using HT makes the whole task take longer, but unless you only run that kind of loads, optimizing is simply a matter of e.g. loading up to core numbers instead of threads when you run those loads on modern CPUs.

link

Dylan16807 1810 days ago

> The earliest implementations of HT worked by statically partitioning

Do you mean SMT in general? I don't think hyperthreading specifically has ever done that, but if I'm wrong I'd love to know more. (And AMD's version falls under "newer desktop-class processors")

link

wtallis 1810 days ago

I meant Intel HT specifically, but I'm going off memory here, and having trouble finding details on those old parts. Agner Fog's current microarchitecture manual doesn't mention HT in its discussion of the P4, but it does include at least one mention of static partitioning of the decoded op queue in the Atom core.

It also describes several instances where Intel's desktop cores used to devote specific resources to each thread on alternating clock cycles, but newer cores have progressively removed those limitations. However, these probably don't quite fit my original assertion because if the OS has literally HALTed one of the virtual CPUs, these alternating clock cycle limitations may have been temporarily removed.

link

hermitdev 1810 days ago

My first experience with HT was on my dual P4 xeons. Performance with HT on was noticeably terrible. It felt like a dog and pony show. It was best to keep HT disabled then. I'm not sure when that changed, but I don't remember what I did on my subsequent Core 2 duo system, but I do have HT enabled on my current (8 or 9 year old) i7 3700 and don't notice any slowdowns. Last I looked, I had to look at very specific benchmarks to find measurable differences. Qualitatively, I don't feel a slow down, either, so I keep it enabled.

link

Dylan16807 1810 days ago

No, don't turn it off. One thread on a core will go full speed, and two threads on a core will do more work than one thread. It's just that your utilization graph will be misleading. A naive graph will assume that two threads do twice as much work as one, but the real improvement is much smaller.

If you turn off hyperthreading and keep the same exact workload, then instead of "the graph says 50% but it's really more like 80-90%", you'll have "the graph says 100% and it's correct". The numbers now accurately represent your lower capacity.

link