Hacker News new | ask | show | jobs
by oxxoxoxooo 1364 days ago
> This instruction is used in some bignum code

Could you be more specific? I think for that to work one would also need the upper half of 64x64 multiplication and `vpmullq` provides only the lower half. You could break one 64x64 multiplication into four 32x32 multiplications (i.e. emulate the full 64x64 = 128 bits multiplication) but I was under the impression that this was slow.

1 comments

I assume that as you say, whoever used this instruction was using it for multiplying 32-bit numbers.

On AMD Zen 4 and Intel Cannon Lake or newer (when AVX-512 is supported), the fastest method to multiply big numbers is to use the IFMA instructions, which reuse the floating-point multipliers to generate 104-bit products of 52-bit numbers.