Hacker News new | ask | show | jobs
by karim 2273 days ago
Genuinely curious, would you mind explaining why some operation can be fast in software but slow in hardware?
2 comments

I think the parent comment is saying it is fast in software on a modern CPU, but making tha into an ASIC would either be a) slow or b) expensive due to the 32-bit additions.

IIRC (I can't find it right now), when NIST had the contest for AES, AES hhad to run on low power hardware in the late 90s/early 2000s. This required things like everything to be fast on an 8-bit microcontroller.

To implement 32-bit + in hardware you need 31 full adders and one half adder, each of which uses multiple gates and depends on the result of the previous adder.

Meanwhile + and bitwise and tend to take the same amount of cycles to be processed, and each cycle takes the same amount of time, see https://gmplib.org/~tege/x86-timing.pdf

Chacha20 in hardware would not be any slower than chacha20 in software, but it would be slower than other algorithms which do not use 32-bit +.

> To implement 32-bit + in hardware you need 31 full adders and one half adder, each of which uses multiple gates and depends on the result of the previous adder.

This is not how CPUs typically implement addition, or other ALU operations. Carry-lookahead adders have existed since the 1950s: https://en.wikipedia.org/wiki/Carry-lookahead_adder

Thank you, I love this citation so much.

> Charles Babbage recognized the performance penalty imposed by ripple-carry and developed mechanisms for anticipating carriage in his computing engines.