| Don't 64-bit CPUs usually have efficient instructions for operating on narrower values? For instance, consider this C code for adding two 96-bit numbers on a 64-bit machine (ignoring carry for now): #include <stdint.h>
extern void mark(void);
int sum(uint64_t * a, uint64_t * b, uint64_t * c)
{
mark();
*c++ = *a++ + *b++;
mark();
*(uint32_t *)c = *(uint32_t *)a + *(uint32_t *)b;
mark();
return 17;
}
The purpose of the mark() function is to make it easier to see the code for the additions in the assembly output from the compiler. Here is what "cc -S -O3" (whatever cc comes with MacOS High Sierra) produces for my 64-bit Intel Core i5 for the parts that actually do the math: callq _mark
movq (%rbx), %rax
addq (%r15), %rax
movq %rax, (%r14)
callq _mark
movl 8(%rbx), %eax
addl 8(%r15), %eax
movl %eax, 8(%r14)
callq _mark
I'm not too familiar with x86-64 assembly, but I am assuming that this could be made to handle carry by changing the "addl" to whatever the 32-bit version of adding with carry is.Taking out the (uint32_t * ) casts to turn the C code from 96-bit adding into 128-bit adding generates assembly code that only differs in that both movl instruction become movq instructions, and addl becomes addq. So, if you were writing in C it looks like a 96-bit add would be a little uglier than a 128-bit add because of the casts but isn't slower or bigger under the hood. But note that this is assuming accessing the 96-bit number as an array of variable sized parts. It's that assumption that introduces the need for ugly casts. If a struct is used, then there is no need for casts: #include <stdint.h>
typedef struct {
uint64_t low;
uint32_t high;
} addr;
extern void mark(void);
int sum(addr * a, addr * b, addr * c)
{
mark();
c->low = a->low + b->low;
mark();
c->high = a->high + b->high;
mark();
return 17;
}
This generates the same code as the earlier version.(I still have no idea how to handle the carry in C, or at least no idea that is not ridiculously inefficient. When I've implemented big integer libraries I've either used a type for my "digits" that is smaller than the native integer size so that I could detect a carry by a simple AND, or I've handled low level addition in assembly). |