Hacker News new | ask | show | jobs
by repelsteeltje 381 days ago
Yes. Another approach would be to use regular 64 bit chunks and speculatively execute each add with and without carry in parallel. Then select the correct variant based on carry result of less significant addition.

With double the amount of additions this allows for log(bits) propagation time (versus linear)

3 comments

Wouldn’t that produce 2^n possible results to choose from, where n is the number of chunks? That seems like a lot of additional (he-he) instructions executed.
Nope. Just 2n: each chunk pair is added once without carry, and once won't carry=1.

For as long as radix=2, you either have a carry or you don't.

For a single addition, the radix is irrelevant, the carry is always zero or one: (r-1) + (r-1) = 2r - 2 < 2r.
But for a single addition you will also not save any time because you still need to do the select in series. The whole point of the radix 2^51 trick is that you can defer the normalization over several additions, but in order to do that your carry needs to be more than a single bit.
There's not just "result with carry" and "result without carry" but rather one variant of that per word of the input

... Which likely isn't that bad to code up.