| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by gpderetta 2363 days ago
	Also if both threads are pinned to separate cores and nothing else is supposed to run on those cores, it is pointless to use anything but spinlocks as there is no other thread that could better use the core (and probably you do not want the core to go to a low power syate waiting for an interrupt).

2 comments

jdblair 2363 days ago

You're discounting energy use. This is a bad strategy on a battery powered device.

link

sephamorr 2363 days ago

In addition to energy, power use is another reason; parking a core will allow it to cool down thermally, so that when it is put back in use (milli)seconds later, it can run at a higher clock speed for longer.

link

corysama 2363 days ago

Intel has a low-power PAUSE instruction that is literally a ‘rep nop’. I assume Arm has one too.

link

temac 2363 days ago

That's not extremely low power compared to real low power states. The main advantage of PAUSE is the scheduling of the other hyperthread (if it exists) and maybe not generating a gratuitous L1 / MESI workload at a crazy rate (well if programmed correctly that should be quite cheap in lots of cases, but still...). To my knowledge this does not cut any clock, so the power economy is going to be minimal.

link

dfox 2363 days ago

IIRC the mov imm, %ecx; rep nop sequences are somewhat special cased by modern architectures (and this fact is the only reason why you even would want to execute such code). On the other hand the energy savings are mostly negligible and it is simply an SMT-level equivalent of sched_yield()

link

gpderetta 2363 days ago

Actually I heard that the last few generations of intel (from skylake) enter power state mode more aggressively with pause and the latency of getting out of a pause went up from tens of cycles to hundreds. No first hand testing though.

link

gpderetta 2363 days ago

Yes, you wouldn't use this strategy on a battery powered device. It ia for very specialised applications.

link

rumanator 2363 days ago

> and nothing else is supposed to run on those cores

That's quite the corner case.

link

titzer 2363 days ago

This is exactly the situation for a well-balanced parallel work queue. You want to start as many threads as there are cores and run them full tilt pulling work off the queue until it is empty. If you're running a large scale cluster that is dedicated to a particular task (e.g. like servicing a special kind of query, or encoding videos, rendering, etc), this is very common, or even a parallel Photoshop filter.

link

rumanator 2363 days ago

> This is exactly the situation for a well-balanced parallel work queue.

What if your work queues are running on a multitasking operating system that runs services? And what about a hypervisor?

link

johncolanduoni 2363 days ago

For this technique you generally dedicate some core(s) to those miscellaneous threads and flag the rest as unscheduable unless a thread is specifically assigned to them.

If you’re not sharing cores between VMs it’s typical to do the same at the hypervisor layer.

link

shaklee3 2363 days ago

This is the normal use case for any DPDK software. I think anyone involved in HPC or high-speed networking knows that this is pretty common.

link

gpderetta 2363 days ago

Yes, in practice you have to dedicate the whole machine for a specific application, but the one thread per isolated core is a proven one for high performance/low latency applications.

link