| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by uudecoded 1693 days ago
	I am curious, since Intel is relying more and more on P and E cores as well, is there any reference or research available for optimizing multithreaded userland process tasks with varying QoS? A lot of the pthreads books I see are from the late 90s. Is there a more recent reference? What's the best way to write cross-platform (e.g. not Grand Central Dispatch) multithreaded apps with these new chip architectures?

4 comments

majou 1692 days ago

I just ran into this tweet for Intel: https://twitter.com/DeepSchneider/status/1456314755380097027

They move everything that isn't foreground to an efficiency core, which is awful for compiling or video processing.

There's apparently a BIOS option that will use ScrollLock for disabling the efficiency cores entirely.

link

uudecoded 1692 days ago

Thank you for sharing this, it's interesting - I've also gotten the impression (but lack citation) that Intel E cores are targeted at thermal isolation instead of power minimization as the M1 may target.

This is front of mind for me since reading a Cloudflare blog regarding AVX-512 instructions invoking dynamic frequency scaling to manage power/thermal capacity on chip. (https://blog.cloudflare.com/on-the-dangers-of-intels-frequen...)

If this is happening on Xeons, it's probably happening on consumer dies as well, in addition to other non-obvious power/performance optimizations. Perhaps this is why Alder Lake is pumping up the TDP[1]?

edit: [1] https://news.ycombinator.com/item?id=29106860

link

cesarb 1692 days ago

> They move everything that isn't foreground to an efficiency core, which is awful for compiling or video processing.

Windows has had that (foreground boost) for a long time, Intel probably piggybacks on it. It'll be interesting to see how it will behave on Linux, which AFAIK never had that mechanism (except perhaps on Android).

link

dwaite 1692 days ago

For Linux, I believe it will dispatch based on the niceness level and overall CPU utilization - past a certain threshold, it will start putting work at default or higher priority onto the performance cores.

For the Mac, I believe you have equivalent access for scheduling between posix and GCD, but the scheduling configuration is likely way more approachable in GCD.

Also: On M1, there is an added capability to run in a stricter memory model to speed up x86_64 emulation. This only is available on the performance cores, which is one of the reasons people observe non-native code draining the battery quicker.

link

saagarjha 1692 days ago

M1's cores are homogenous and all of them support TSO.

link

masklinn 1692 days ago

Saying that the M1's cores are homogenous is pretty misleading / confusing as the icestorm and firestorm cores are rather different. big.LITTLE/DIQ-type architectures are usually considered heterogenous even if all the actors share an ISA (because you can't treat all the cores

But as to the latter assertion, you're indeed correct per Joe Groff (Swift compiler engineer at Apple): https://twitter.com/jckarter/status/1332045390057639939

> The A12 only supported TSO on the performance cores. The M1 supports it on all cores.

link

saagarjha 1692 days ago

Yeah, when I said "homogenous" I was solely referring to the ISA. Trying to enable TSO on a Tempest core will fail with an undefined instruction exception, but I think A12Z is ISA homogenous in userspace.

link

sydthrowaway 1692 days ago

My understanding is as long as you specify the QoS currently, GCD takes care of it (as it has done for Apple Ax SoCs on iPhone).

link

wmf 1692 days ago

I think people are just now starting that research and blog posts like this one are all we have so far.

link

dev_tty01 1692 days ago

Asymmetric multiprocessing has been a big topic of research for many, many years.

https://scholar.google.com/scholar?hl=en&as_sdt=0%2C33&q=asy...

link

wmf 1692 days ago

Yeah, but I'd bet 90% of that research makes wacky assumptions that don't apply to real processors. When real hardware becomes available you start over from scratch. (Source: I am a former CS researcher.)

link

seniorivn 1692 days ago

isn't arm big little architecture the norm in a wildly used processors for a decade or so?

link