|
|
|
|
|
by dependenttypes
2275 days ago
|
|
To implement 32-bit + in hardware you need 31 full adders and one half adder, each of which uses multiple gates and depends on the result of the previous adder. Meanwhile + and bitwise and tend to take the same amount of cycles to be processed, and each cycle takes the same amount of time, see https://gmplib.org/~tege/x86-timing.pdf Chacha20 in hardware would not be any slower than chacha20 in software, but it would be slower than other algorithms which do not use 32-bit +. |
|
This is not how CPUs typically implement addition, or other ALU operations. Carry-lookahead adders have existed since the 1950s: https://en.wikipedia.org/wiki/Carry-lookahead_adder