Hacker News new | ask | show | jobs
by Paul_Clayton 61 days ago
When ARM moved to 64-bit the ISA was much more substantially reworked than for AMD's x86-64 transition (which mainly added modes and repurposed INCrement and DECrement to provide the REX prefix which provides a 64-bit size specification and one additional bit for register name specifiers; obviously the page table format also changed). I am not particularly familiar with AArch64, but I got the impression that the main retained cruft from 32-bit ARM was condition codes and the tradeoffs of providing condition codes would lead some not to consider such cruft. The use of four bits for almost every instruction to support predication was eliminated — which was a major cruft point for 32-bit ARM — and the legacy of shift and perform ALU operation orientation of the original ARM (which had timing slack from the slowness of instruction fetch) was de-emphasized.

AArch64 is accumulating cruft, perhaps particularly with respect to SIMD, but it is less crufty than x86-64.

ISA modularity/diversity can be useful for embedded systems, where the software is really firmware. If one is going to have to provide a diversity of compilation targets via either a common distribution format that is compiled to the local machine code or an app store that receives a software format that can be compiled to diverse, the best distribution format (to users or the app store) is likely to be significantly different than an encoding best for direct execution.

Some optional features can be hidden by system libraries (particularly when the main use of the feature is suitable for a separate accelerator). E.g., an instruction that performs a round of AES encryption could be hidden behind an encryption library. However, some uses of an AES instruction involve a very short "message" for which library overhead would be excessive or for which good enough software alternatives would be faster than actual AES.

Indexed memory accesses and conditional select/move, for example, are not really suitable to system libraries (or trapping to software even with a very fast trap handler).

ISA scaling is not necessarily a good design feature. An ISA optimized for the market targeted by ARM M Profile is unlikely to be optimal for future 16-wide decode high performance processors. E.g., if a context only has 16 registers, using 5-bit register specifiers is suboptimal even though it allows software to be "upward compatible" with a 32-register design.