Hacker News new | ask | show | jobs
by s3cur3 2767 days ago
They did a follow up changing the scheduling policy for thread 0 (again, still on Windows) and it didn’t make a difference for almost all their workloads: https://www.anandtech.com/show/13446/the-quiz-on-cpu-0-playi...
2 comments

AnandTech really needs to hire a Linux-focused editor to do some benchmarks there too, especially for these large systems that are unlikely to be running Windows anyways.

The Phoronix benchmarks are quite clear,[0] I don't know why you keep linking to AnandTech's Windows benchmarks. I say this as someone who reads tons of AnandTech reviews because they're great, but Windows just doesn't do well with high core count hardware at all.

[0]: https://www.phoronix.com/scan.php?page=article&item=2990wx-l...

This seems like a familiar issue I've run into with workstations I've used in the past running Xeons. Not sure how NTOSKRNL handles scheduling of parallel tasks. I'd venture a guess and say it's hybrid (M:N threads), where multiple userland application threads are mapped to some "virtual processor" in kernelmode. That leads to priority inversion between the userland and kernelmode threads, which could explain why Windows benchmarks are terrible when dealing with multiple physical cores.
As far as I know Win NT threads are 1:1.

Not even sure how it would work or even make any sense to have N:M handled by the kernel. N:M is usually a mainly a userspace thing. And Windows is even less likely to use that kind of convolution, because IIRC it can call back from kernel to userspace (that design I would not recommend, btw, but oh well). You have fibers, of course, but that's a different thing.

Windows does not scale probably simply because the kernel is full of "big" locks (at least not small enough...) everywhere, and they have far less fancy structures and algo than Linux (is there any equivalent of RCU that is widely used in there? - not sure). Cf the classic posts of the builder of Chrome who every now and then encounter a ridiculous slowdown of his builds on moderately big computers, sometimes because of mutexes badly placed.

> Not even sure how it would work or even make any sense to have N:M handled by the kernel. N:M is usually a mainly a userspace thing.

Correct, I meant that the benchmarking program itself probably used that implementation. Not the Win NT kernel’s implementation of OS threads.

See the phoronix benchmarks, linux performs much better on same workloads.