Hacker News new | ask | show | jobs
by viraptor 1066 days ago
Would anything prevent compilers from approaching it like SSE on Intel? Check for feature presence and enable the appropriate path (if using compiler-generated code).
2 comments

> Check for feature presence

That is the hard part. On Intel you can use CPUID, but it is ARM policy to not expose such instructions. You can read /proc/cpuinfo, but that is Linux-specific.

Edit: there is a reason for ARM policy: CPUID is a well known virtualization hazard. In fact, KVM immediately traps if you execute CPUID on guest. ARM made a good decision here. Still, it means things can't work exactly like it worked on Intel.

The Arm Linux kernel allows you to use some of the "read ID register" instructions from userspace, because it traps them in the kernel and emulates them to present you with a slightly sanitized view of the available hardware: https://www.kernel.org/doc/html/v5.8/arm64/cpu-feature-regis...

You can also look at the hwcaps (available in the ELF aux vector) -- this is the older mechanism.

It's true that there's no cross-OS mechanism to do this, but that's life -- often the OS wants to get in anyway to sanitize the answers (eg so it can tell you "feature X is not present" when it knows about a hardware erratum or the OS was built without feature-X support).

https://news.ycombinator.com/item?id=18542040 talks about registers with a similar purpose.
Only in EL1, which means you can only use them in kernel.
But specific instructions should be testable anyway, right? Try to execute with an exception handler and you'll know.
On Arm this is generally a bad idea -- there are, or were, some corner cases where the kernel can know that an extension shouldn't be used, but it doesn't have a mechanism for "make the instructions UNDEF". The example I know about is ancient history now -- on the Cortex-A8 I think you could build a kernel without Neon support or perhaps the kernel might find there was a Neon-related erratum, but there was no way to disable Neon to force the UNDEFs.

The recommended approach is to use HWCAPs, or else to use the kernel's "emulated ID register accesses" functionality.

Debian distributes compiled binaries. Thus, they have to either turn processor features on in all their binaries, or off (or distribute two sets of binaries).
There are options for runtime instruction selection:

https://wiki.debian.org/InstructionSelection

You're thinking of optimizing the code for a specific processor. Run-time codepaths that detect CPU features have existed since MMX and SSE
Debian does distribute SSE-using binaries (with fallback) on i386 which detects presence of SSE at runtime using CPUID.