Hacker News new | ask | show | jobs
by api 1791 days ago
Given that all these are ARX cores, I wonder if a fused ARX instruction could cover a wide range of them?
2 comments

ARMv8.2 has rotate-and-xor and xor-and-rotate so that the (extremely cheap) xor can be saved.
Probably. With a barrel shifter easily. Barrel shifters are a bit slower than wired shifts though, so for ultimate speed you'd end up with hardware shift amounts.
The advantage would be a single instruction that implemented the core of a wide range of things including BLAKE2, BLAKE3, Salsa, ChaCha, Speck, and more.