Hacker News new | ask | show | jobs
by prattmic 2170 days ago
While CPUID itself is far from elegant, I don't think ARM is fundamentally different. e.g., the ID_AA64ISAR1_EL1 register (ARM wins no naming awards here) indicates the presence of certain instructions.

https://developer.arm.com/docs/ddi0595/h/aarch64-system-regi...

1 comments

aarch64 is actually worse as far as I can tell because reading from that register is restricted and cannot be done from userspace (hence "EL1") and thus doing this portably is a huge pain. On Linux I think some of this is exported via /proc/cpuinfo and auxval, so that's what most people use…

Edit: Also there's a ton of them, this lists a couple: https://www.kernel.org/doc/html/latest/arm64/cpu-feature-reg...

It makes good sense for the OS to abstract this for a few reasons:

- You'll get into trouble when migrating a process between cores in heterogeneous environments (having cores that support different ISA extensions is relatively uncommon today, but Intel has already announced some, and people have gotten into trouble in the past with e.g. reading cacheline sizes when big and little cores used different values).

- This can be a pretty slow operation on some platforms, and the system can make it more efficient by caching it (and can provide additional capabilities information that may not have existed on older CPUs so that there's a uniform interface).

I remember when that came up–I still don't understand how you are supposed to support a process that has queried the kernel for some feature ("can I use SVE2?"), gets an affirmative response, starts using that, and midway gets migrated to another core that doesn't support those instructions? Either you're going to either have the same extensions on all the cores, or lie about it by presenting the lowest common denominator to each of them…although I guess if you did want to do that, you'd want the kernel to be able to fake this from the start.
One approach could be that the OS pins the process or threads to cores that support the features that were queried for.

i.e: If a process asks "can I use SVE2" and gets a yes, the OS doesn't move it to any core for which that isn't true.

Right, but as I mentioned here: https://news.ycombinator.com/item?id=23835764 (and last month: https://news.ycombinator.com/item?id=23481049) every process is going to ask this…
Ah, that's very a good point.
Alternatively, an invalid instruction fault is signaled, handled by the OS, which activates the core capable of that instruction and migrates the process there. Not cheap, but I would expect that to happen rarely, if the lesser core is only used during low-power operation.
Yeah, but that means that every process is going to forced to run on the big core the moment that you call into libc's vectorized string functions…
There are similar issues for any suspend-able and/or hot-swappable system. An extension to process signals should be created to inform them that a system-context change has occurred. Clock timestamps, features, all assumptions should be invalidated.
Yeah, but then programs have to know about that. Up until now the general assumption has been that context switches are largely transparent…
I think Intel is going the lie about it route for their first “hybrid” chip.
Nah, they just disable all the features that aren't shared by the little cores.
That's the "lie about it" route I mentioned above.