Hacker News new | ask | show | jobs
by addaon 381 days ago
> Aside: Why 13 bits instead of 12? For our purposes, we’re going to ignore the carries in the most significant limb, allowing numbers to wrap when they overflow past 2256 - 1 (just like how unsigned addition works in C with normal size integer types). As a result, we can assign 52 bits to the most significant limb and ignore the fact that it will run out of room for carries before the other limbs do.

Why not give the top limb 64 bits and the other four limbs 48 bits each, then? You can accumulate more additions before normalization, you can take advantage of word alignment during splitting and normalization if your instruction set has anything useful there, and your overflow properties are identical, no?

3 comments

>> Why not give the top limb 64 bits and the other four limbs 48 bits each, then?

I think one goal is to use 5 64 bit registers to do 256 bit math. That means using 256/5 = 51.2 bits of each word. That's probably some kind of ideal if you want 256bit math, but not optimal if you're writing a generic big-int library. In the old days you'd want to use exactly one byte for the carry(s) because we didn't have barrel shifters to do arbitrary bit shifts efficiently. In that case I'd use 56 bits of the 64 to get nice byte alignment.

This is all quite relevant for RISC-V since the ISA does not have flags.

Even with this explanation a 64 + 48*4 is clearly superior. You can go longer without overflow (since you have 16 bits of carry space per pseudo-digit), and the amount of carry space is aligned even more nicely.
>That means using 256/5 = 51.2 bits of each word.

Why must each word have the same amount? Why not 64 bits on the top word, and 48 bits on the other 4 words?

Evenly distributing the number of bits per word lets you chain more additions/subtractions before having to normalize.
Sure, but the point is that for the most significant limb, there is no point in having redundant bits because whatever you put in them will be discarded after normalization.
Ah, in that case, you're right, it would make sense to use all 64 bits for the top limb. Still, making them all equal-sized can have benefits if you use SIMD or similar techniques to operate on them uniformly. One project of mine has been trying to work with large integers in CUDA by distributing their limbs across a warp.
Maybe some use cases want to know if there was overflow? In which case having more space for carry bits to accumulate makes it easy to give an accurate answer
Because adding the top limbs of two encoded numbers would overflow too soon. If you set both to 2^63 for example, they overflow immediately. Might be fine for wraparound arithmetic, but not in general.
Setting both to 2^63 means your original 256-bit numbers were 2^255, thus the addition would overflow no matter what intermediate encoding you’re using.
Sure, then set one to 2^62 and the other to -2^62 (namely: 0b1100..00). It's overflow as far as unsigned arithmetic is concerned, but not in the case of signed arithmetic.

That said, when you're dealing with 256-bit integers, you're almost assuredly not working with signed arithmetic.

...so? They don't care about top limb overflow, at all. That's the point.
Then you would need 6 words to hold a 256-bit value instead of 5 in the OP, and consequently more instructions to add them.
64 + 48 * 4 == 256... still just five 64-bit words.
Now you can’t detect overflow?