Hacker News new | ask | show | jobs
by pizza234 2086 days ago
For busy waiting in general, AFAIK one should signal the intention to the compiler, via intrinsics (`_mm_pause` in C, `std::sync::atomic::spin_loop_hint` in Rust, and so on). The compiler and processor will attempt to make the best out of the pause, e.g. send the processor to a lower-consumption state.

It may be overkill for such cases, but if one keeps the strategy as reference, it's best to know the optimal one :-)

I even think it's not overkill for this case, as they're actually working around the compiler, and a clean approach is preferrable regardless.

2 comments

>e.g. send the processor to a lower-consumption state.

You have to be very careful in a bootloader, since you're initializing the clocks and other very low level subsystems you don't want to risk putting the CPU in a state you won't be able to get out of.

In general the standard method is to use a dummy busy delay like the "nop loop" shown above in the very early stages until you get to the point where you can enable interrupts and standby states and implement a proper sleep method that will effectively shut the CPU down waiting for an external event.

If you're talking about userland code (or even driver code) using busyloops you're right though.

Isn’t _mm_pause an Intel intrinsic?
Based on what I see on [Stack Overflow](https://stackoverflow.com/a/51908785), it's recognized by all the major compilers

The [Microsoft Compiler help](https://docs.microsoft.com/en-us/cpp/intrinsics/x86-intrinsi...) includes it in the "x86 intrinsics list".

I've just compiled with success, for testing purposes, on an AMD processor.

C, relevant section:

    #include <immintrin.h>

    void test() {
        _mm_pause();
    }
Output ASM, relevant section:

    test:
    .LFB4006:
      .cfi_startproc
      endbr64
      pushq %rbp
      .cfi_def_cfa_offset 16
      .cfi_offset 6, -16
      movq  %rsp, %rbp
      .cfi_def_cfa_register 6
      rep nop
      nop
      nop
      popq  %rbp
      .cfi_def_cfa 7, 8
      ret
      .cfi_endproc
    .LFE4006:
      .size test, .-test
      .globl  main
      .type main, @function

It should be the `rep nop`.

Including `ammintrin.h` yields the same `rep nop`.

By “Intel” I really meant “x86” :P Is there an ARM compiler that recognizes it?
Unlikely, intrinsics are CPU, and sometimes CPU model, specific. Compiler builtins will instead use the intrinsic if it's available/suitable or generate equivalent code if it's not.
Ah! Sorry for that; I have no idea.

It seems that at least some ARMs have [timers that can be used](https://electronics.stackexchange.com/q/450971), which is a different strategy, but still more appropriate (I suppose).