SMT is usually a throughput win, and usually a latency loss. I was playing with my Threadripper a while back and for C++ builds of large projects, HT results in about a 10% improvement in compilation speed. 10% is a big deal and you should take it. The downside is that games had noticeably lower framerates even with the rest of the CPU idle (at least the games I play are bounded by single-thread performance across maybe 2-4 cores). I kind of blame Windows's scheduler there, since it should be able to say "hey, a game is running, don't schedule anything on the same physical core that has a game thread running on it", but I don't think it does. It might schedule both game thraeds on the same physical core and then they contend with each other and run 40% slower each. Also be careful about memory -- 64GB wasn't enough for 64 concurrent clang runs. You need a little bit more (but of course 128GB is the next installable increment). (I can also see more threads aggravating other resource constraints; notably disk IOPS, but I didn't notice a problem there myself. It's also possible that SMT increases power use and so decreases turbo speeds, and that might have an impact. I didn't measure that when I was testing.)
For me, I keep SMT turned off. The latency is more important than the throughput for my workstation, but if you do full builds of C++ regularly, you might want it on. Use the 10% time you're getting back to switch to a build system that can cache things, though.
Notably, triple-rank configurations work just fine.
It's just not trivial to find both a dual-rank module and a single-rank module with the same components used on both (so all the sub-timings match and performance will be nice and proper).
Depends on the machine. The earliest implementations of HT worked by statically partitioning various caches and other resources in the processor core in half, which meant that a single-threaded process really could slow down by having HT enabled but not actively used. Newer desktop-class processors tend to have no significant downsides to leaving HT enabled, but there might still be some SMT implementations on niche products that don't handle this well.
There still are (and probably will always be) some workloads where using HT makes the whole task take longer, but unless you only run that kind of loads, optimizing is simply a matter of e.g. loading up to core numbers instead of threads when you run those loads on modern CPUs.
> The earliest implementations of HT worked by statically partitioning
Do you mean SMT in general? I don't think hyperthreading specifically has ever done that, but if I'm wrong I'd love to know more. (And AMD's version falls under "newer desktop-class processors")
I meant Intel HT specifically, but I'm going off memory here, and having trouble finding details on those old parts. Agner Fog's current microarchitecture manual doesn't mention HT in its discussion of the P4, but it does include at least one mention of static partitioning of the decoded op queue in the Atom core.
It also describes several instances where Intel's desktop cores used to devote specific resources to each thread on alternating clock cycles, but newer cores have progressively removed those limitations. However, these probably don't quite fit my original assertion because if the OS has literally HALTed one of the virtual CPUs, these alternating clock cycle limitations may have been temporarily removed.
My first experience with HT was on my dual P4 xeons. Performance with HT on was noticeably terrible. It felt like a dog and pony show. It was best to keep HT disabled then. I'm not sure when that changed, but I don't remember what I did on my subsequent Core 2 duo system, but I do have HT enabled on my current (8 or 9 year old) i7 3700 and don't notice any slowdowns. Last I looked, I had to look at very specific benchmarks to find measurable differences. Qualitatively, I don't feel a slow down, either, so I keep it enabled.
No, don't turn it off. One thread on a core will go full speed, and two threads on a core will do more work than one thread. It's just that your utilization graph will be misleading. A naive graph will assume that two threads do twice as much work as one, but the real improvement is much smaller.
If you turn off hyperthreading and keep the same exact workload, then instead of "the graph says 50% but it's really more like 80-90%", you'll have "the graph says 100% and it's correct". The numbers now accurately represent your lower capacity.
For me, I keep SMT turned off. The latency is more important than the throughput for my workstation, but if you do full builds of C++ regularly, you might want it on. Use the 10% time you're getting back to switch to a build system that can cache things, though.