Hacker News new | ask | show | jobs
by SAI_Peregrinus 1790 days ago
And if you're designing hardware (or on an FPGA) ChaCha (and BLAKE2/3) are actually really freakin' fast for the die area they take. IIRC ChaCha20 beats AES in hardware. It's just extremely rare (FPGA only) to find it in hardware, since there's not enough demand. I expect that to eventually change.
1 comments

Given that all these are ARX cores, I wonder if a fused ARX instruction could cover a wide range of them?
ARMv8.2 has rotate-and-xor and xor-and-rotate so that the (extremely cheap) xor can be saved.
Probably. With a barrel shifter easily. Barrel shifters are a bit slower than wired shifts though, so for ultimate speed you'd end up with hardware shift amounts.
The advantage would be a single instruction that implemented the core of a wide range of things including BLAKE2, BLAKE3, Salsa, ChaCha, Speck, and more.