|
Yes, any operation is easy in 2^(2^n). For instance, take addition of two 128-bit numbers x and y (seen as 64-bit int arrays) on a 64-bit big-endian CPU: sum[1] = x[1] + y[1]
sum[0] = x[0] + y[0] + carry from previous operation
In contrast, if you'd use 96 bits, you couldn't just use 64 bit integer operations. Instead, you'd have to cast a lot: sum[4..11] = *((int64*) x) + *((int64*) y)
sum[0..3] = (int32) ( (int64) *((int32*) x) + (int64) *((int32*) y) + carry)
So you'd read 32 bit-values into 64 bit registers, set the top 32 bits to zero, perform the addition, and then write out a 32bit value again.It gets much worse if your CPU architecture does not support the addition to 2^(2^n); if you were to use 100 bits, you'd have to AND the values with a bitmask, and write out single bytes. So 128 is far easier to implement, faster on many CPU architectures, plus you get the peace of mind that your code works for a long time. For instance, let's assume the lower bound of 9 months per doubling (which is unrealistic as described in this article), then you're going to hit: 50 bits (baseline from article): 2004
64 bits: 2014
80 bits: 2026
92 bits: 2035
100 bits: 2040
128 bits: 2062
Now, what's the expected lifetime of a long-term storage system? It's well-known that the US nuclear force uses 8 inch floppy disks. Those were designed around 1970. So a lifetime of roughly 50 years is to be expected. For ZFS, that would be 2054. By this (admittedly very conservative) calculation, 128 bits is only barely more than required. |
For instance, consider this C code for adding two 96-bit numbers on a 64-bit machine (ignoring carry for now):
The purpose of the mark() function is to make it easier to see the code for the additions in the assembly output from the compiler. Here is what "cc -S -O3" (whatever cc comes with MacOS High Sierra) produces for my 64-bit Intel Core i5 for the parts that actually do the math: I'm not too familiar with x86-64 assembly, but I am assuming that this could be made to handle carry by changing the "addl" to whatever the 32-bit version of adding with carry is.Taking out the (uint32_t * ) casts to turn the C code from 96-bit adding into 128-bit adding generates assembly code that only differs in that both movl instruction become movq instructions, and addl becomes addq.
So, if you were writing in C it looks like a 96-bit add would be a little uglier than a 128-bit add because of the casts but isn't slower or bigger under the hood. But note that this is assuming accessing the 96-bit number as an array of variable sized parts. It's that assumption that introduces the need for ugly casts.
If a struct is used, then there is no need for casts:
This generates the same code as the earlier version.(I still have no idea how to handle the carry in C, or at least no idea that is not ridiculously inefficient. When I've implemented big integer libraries I've either used a type for my "digits" that is smaller than the native integer size so that I could detect a carry by a simple AND, or I've handled low level addition in assembly).