Hacker News new | ask | show | jobs
by colmmacc 1197 days ago
It's very worth doing in this context ... almost all of the assembly I've written in the last ten years has been on routines like this. Compilers are very smart, but it's hard for them to optimize concurrent and interleaved cryptographic algorithms to be cache pipeline efficient and operation efficient at the same time.

AES-GCM is "AES" and "GCM" running at the same time on the same data. ChaCha20 is "ChaCha20" and "Poly1305" running at the same time on the same data, usually block by block so that you avoid pulling data into cache more than once. You can interleave their imperative operations in C, or Rust code (or whatever) ... but the compiler isn't going to intuit how some of the math can be re-used across the algorithms without a lot of hints, or how it can be safely vectorized, and at that point you might as well just write the assembly.