|
|
|
|
|
by shaggie76
692 days ago
|
|
A grossly over-simplified argument for SMT that resonated with me was that it could keep a precious ALU busy while a thread stalls on a cache miss. I gather in the early days the LPDDR used on laptops was slower too and since cores were scarce so this was more valuable there. Lately, though, we often have more cores than we can scale with and the value is harder to appreciate. We even avoid scheduling work on a shared with an important thread to avoid cache-contention because we know the single-threaded performance will be the bottleneck. A while back I was testing Efficient/Performance cores and SMT cores for MT rendering with DirectX 12; on my i7-12700K I found no benefit to either: just using P-cores took about the same time to render a complex scene as P+SMT and P+E+SMT. It's not always a wash, though: on the Xbox Series X we found the same test marginally faster when we scheduled work for SMT too. |
|
SMT shines while waiting for I/O or doing some simple integer stuff. If both your threads can saturate the FPU, SMT is generally slower because of the extra tagging added to the data inside the CPU to note what belongs where.