|
|
|
|
|
by pittma
386 days ago
|
|
Cool stuff! This method is very similar to how AVX-512-optimized RSA implementations work too, as they also have to do Very Large Exponentiations. This paper[1] covers how RSA does its windowing, which includes the formula showing how the window size can be arbitrary. One additional thing AVX-512 RSA implementations do, though, is store the results of the multiplications for the range [0..2^{window-size}) in a table; then, for each window, that result is retrieved from the table[2] and only the shifting/repositioning has to be done. 1. https://dpitt.me/files/sime.pdf (hosted on my domain because it's pulled from a journal) 2. https://github.com/aws/aws-lc/blob/9c8bd6d7b8adccdd8af4242e0... |
|
Separately, I'm wondering whether the carries really need to be propagated in one step. (At least I think that's what's going on?) The chance that a carry in leads to an additional carry out beyond what's already there in the high 12 bits is very small, so in my code, I assume that carries only happen once and then loop back if necessary. That reduces the latency in the common case. I guess with a branch there could be timing attack issues though