Hacker News new | ask | show | jobs
by 5436436347 2767 days ago
Those (Anandtech) benchmarks were performed on Windows. All threadripper benchmarks on Linux show that it is nowhere near as awful a performer as on Windows and most compute workloads do scale okay. Seen multiple ideas thrown around like Windows not being NUMA aware with this processor or just plain bad core scheduling
2 comments

For reference, here is a link to the Phoronix Windows vs Linux on the 2990WX article:

https://www.phoronix.com/scan.php?page=article&item=2990wx-l...

They did a follow up changing the scheduling policy for thread 0 (again, still on Windows) and it didn’t make a difference for almost all their workloads: https://www.anandtech.com/show/13446/the-quiz-on-cpu-0-playi...
AnandTech really needs to hire a Linux-focused editor to do some benchmarks there too, especially for these large systems that are unlikely to be running Windows anyways.

The Phoronix benchmarks are quite clear,[0] I don't know why you keep linking to AnandTech's Windows benchmarks. I say this as someone who reads tons of AnandTech reviews because they're great, but Windows just doesn't do well with high core count hardware at all.

[0]: https://www.phoronix.com/scan.php?page=article&item=2990wx-l...

This seems like a familiar issue I've run into with workstations I've used in the past running Xeons. Not sure how NTOSKRNL handles scheduling of parallel tasks. I'd venture a guess and say it's hybrid (M:N threads), where multiple userland application threads are mapped to some "virtual processor" in kernelmode. That leads to priority inversion between the userland and kernelmode threads, which could explain why Windows benchmarks are terrible when dealing with multiple physical cores.
As far as I know Win NT threads are 1:1.

Not even sure how it would work or even make any sense to have N:M handled by the kernel. N:M is usually a mainly a userspace thing. And Windows is even less likely to use that kind of convolution, because IIRC it can call back from kernel to userspace (that design I would not recommend, btw, but oh well). You have fibers, of course, but that's a different thing.

Windows does not scale probably simply because the kernel is full of "big" locks (at least not small enough...) everywhere, and they have far less fancy structures and algo than Linux (is there any equivalent of RCU that is widely used in there? - not sure). Cf the classic posts of the builder of Chrome who every now and then encounter a ridiculous slowdown of his builds on moderately big computers, sometimes because of mutexes badly placed.

> Not even sure how it would work or even make any sense to have N:M handled by the kernel. N:M is usually a mainly a userspace thing.

Correct, I meant that the benchmarking program itself probably used that implementation. Not the Win NT kernel’s implementation of OS threads.

See the phoronix benchmarks, linux performs much better on same workloads.