Hacker News new | ask | show | jobs
by conradev 3609 days ago
Is ChaCha20 actually implemented in hardware on any platforms? I was under the impression that the algorithm itself is just really really fast in software (especially so with SIMD).

I implemented ChaCha20 in AArch64 assembly, and it was possible to encrypt/decrypt 6 blocks at once.

1 comments

The Cryptech project uses ChaCha as CSPRNG in our TRNG. We decided on ChaCha because of its performance and good security margin. I know of at least one more project that uses our ChaCha core.

https://cryptech.is/

ChaCha can efficiently be implemented in HW, esp in FPGAs that supports carry chains, which basically means most FPGAs.

It is somewhat hard do compare size and speed since both ChaCha and AES are so scaleable. In ChaCha there are many places where you can trade operator reuse with performance. But the fundamental operator size is 64-bits.

AES in comparison works on bytes and you can go from a single S-box (implemented as a table, as logic, as part of a T-box etc) that is reused in the datapath as well as key expansion all the way to a fully pipelined (10-14 rounds) humongous implementation. Very flexible and easy to adapt to the system requirements. One additional thing to note with AES is that for many cipher modes, the decryption functionality can be removed.

But with all this said. If I compare my implementation av AES (that includes decryption) with my implementation of ChaCha20, I get about 4x better performance with ChaCha with fairly close the same number of resources.

https://github.com/secworks/chacha https://github.com/secworks/aes

The ChaCha core requires more registers, esp for the API. This is due to the bigger block size (512 vs 128)

I like ChaCha in HW and thinks its a good choice. I'm currently working on a ChaCha20-Poly1305 core compatible with RFC7539 to make it easier for HW projects to use good AEAD ciphers.

https://tools.ietf.org/html/rfc7539

Thanks for the perspective. One small correction/clarification: ChaCha operates on pairs 32-bits at a time, not 64-bits, which makes it nice for 32-bit only systems in software. I really wish ChaCha20/Poly1305 was included in benchmarks for the CAESAR AEAD contest since my understanding is that it would do a little better than NORX (at least in software and it would be interesting to see how it compares in hardware), which is generally the fastest of the secure non-AES options (e.g. disqalifying MORUS due to the BRUTUS identified adaptive chosen plaintext issue).

For those wondering why this came up now, the third round CAESAR candidates will be announced any day now. DJB's choices in Salsa20/ChaCha are still looking very good.

The ability to do relatively effient masking/blinding in LRX algorithms is a major advantage at least, but with NORX you need 64-bit operations to get a 256-bit key which is frustrating. I wonder if NORX32-f could be used to make a Salsa20/ChaCha style stream cipher where you operate on block size data (say use the pseudo-addition to incorporate the start state).

Agree on having ChaCha20-Poly1305 in the benchmarks would be good. RFC 7539 has been publshed and there are already several applications using this combination (as has been mentioned).

Any winning algorithm(s) from Ceasar will compete with ChaCha20-Poly1305 and should be chosen to provide some clear advantage. Better performance, agilty, scalability, security including side-channel leakage and other attacks on implementations for example.

Really looking forward to see the round three announcement.

Sorry, the brain mistyped 32 with 64. Thanks for pointing it out.