| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by 5436436347 2767 days ago
	Those (Anandtech) benchmarks were performed on Windows. All threadripper benchmarks on Linux show that it is nowhere near as awful a performer as on Windows and most compute workloads do scale okay. Seen multiple ideas thrown around like Windows not being NUMA aware with this processor or just plain bad core scheduling

2 comments

celrod 2767 days ago

For reference, here is a link to the Phoronix Windows vs Linux on the 2990WX article:

https://www.phoronix.com/scan.php?page=article&item=2990wx-l...

link

s3cur3 2767 days ago

They did a follow up changing the scheduling policy for thread 0 (again, still on Windows) and it didn’t make a difference for almost all their workloads: https://www.anandtech.com/show/13446/the-quiz-on-cpu-0-playi...

link

coder543 2767 days ago

AnandTech really needs to hire a Linux-focused editor to do some benchmarks there too, especially for these large systems that are unlikely to be running Windows anyways.

The Phoronix benchmarks are quite clear,[0] I don't know why you keep linking to AnandTech's Windows benchmarks. I say this as someone who reads tons of AnandTech reviews because they're great, but Windows just doesn't do well with high core count hardware at all.

[0]: https://www.phoronix.com/scan.php?page=article&item=2990wx-l...

link

0x8BADF00D 2767 days ago

This seems like a familiar issue I've run into with workstations I've used in the past running Xeons. Not sure how NTOSKRNL handles scheduling of parallel tasks. I'd venture a guess and say it's hybrid (M:N threads), where multiple userland application threads are mapped to some "virtual processor" in kernelmode. That leads to priority inversion between the userland and kernelmode threads, which could explain why Windows benchmarks are terrible when dealing with multiple physical cores.

link

temac 2767 days ago

As far as I know Win NT threads are 1:1.

Not even sure how it would work or even make any sense to have N:M handled by the kernel. N:M is usually a mainly a userspace thing. And Windows is even less likely to use that kind of convolution, because IIRC it can call back from kernel to userspace (that design I would not recommend, btw, but oh well). You have fibers, of course, but that's a different thing.

Windows does not scale probably simply because the kernel is full of "big" locks (at least not small enough...) everywhere, and they have far less fancy structures and algo than Linux (is there any equivalent of RCU that is widely used in there? - not sure). Cf the classic posts of the builder of Chrome who every now and then encounter a ridiculous slowdown of his builds on moderately big computers, sometimes because of mutexes badly placed.

link

0x8BADF00D 2767 days ago

> Not even sure how it would work or even make any sense to have N:M handled by the kernel. N:M is usually a mainly a userspace thing.

Correct, I meant that the benchmarking program itself probably used that implementation. Not the Win NT kernel’s implementation of OS threads.

link

mda 2767 days ago

See the phoronix benchmarks, linux performs much better on same workloads.

link