Hacker News new | ask | show | jobs
by jdewerd 530 days ago
> The prime-counting function approximation tells us there are Li(x) primes less than x, which works out[5] to one prime every 354 odd integers of 1024 bits.

Rule of thumb: Want a 1024-bit prime? Try 1024 1024-bit candidates and you'll probably find one. Want a 4096-bit prime? Try 4096 4096-bit candidates and you'll probably find one.

The approximate spacing of primes around p is ln(p), so ln(2^1024) = 1024*ln(2), and ln(2)=0.693 so if you are willing to absorb 0.693 into your rule of thumb as a safety margin you get the delightfully simple rule of thumb above. Of course, you'll still want to use a sieve to quickly reject numbers divisible by 2, 3, 5, 7, etc, and this easily rejects 90% of numbers, and then do a Fermat primality test on the remainders (which if you squint is sort of like "try RSA, see if it works"), and then do Miller-Rabin test to really smash down the probability that your candidate isn't prime. The probabilities can be made absurdly small, but it still feels a bit scandalous that the whole thing is probabilistic.

EDIT: updated rule of thumb to reflect random candidate choice rather than sequential candidate choice.

3 comments

It's been a while since I've looked at the literature on RSA prime generation, but I seem to remember that picking a random starting point and iterating until you find a prime is discouraged because primes aren't evenly distributed so key generation timing could reveal some information about your starting point and eventual prime choice.

I'm not sure how realistic of an issue this is given the size of the primes involved. Even if an attacker can extract sensitive enough timing information to figure out exactly how many iterations were required to find a 1024 bit prime from a 1204 bit random starting point, I'm not aware of a good way to actually find either value. You do also introduce a bias since you're more likely to select prime numbers without a close neighbor in the direction you are iterating from, but again I'm not sure how practical an attack on this bias would be.

Still, to avoid any potential risk there I seem to remember best practice being to just randomly generate numbers of the right size until you find a prime one. With the speed of modern RNGs, generating a fresh number each time vs iterating doesn't seem like a significant penalty.

Yes, excellent point! I originally omitted this detail for simplicity, but on reflection I don't think it actually achieved much in the way of simplifying the rule so I changed it to reflect reality. Thanks for pointing that out.

EDIT: the rush of people offering up sieve optimizations is pushing me back towards formulating the rule of thumb on a consecutive block of numbers, since it makes it very clear that these are not included, rather than implicitly or explicitly including some subset of them (implicit is bad because opacity, explicit is bad because complexity).

I've seen many implementations that relied on iterating (or rather, they used a prime sieve but it amounts to the same thing in terms of the skewed distribution). While maybe not a problem in practice I always hated it - even for embedded systems etc. I always used pure rejection sampling, with a random one in each iteration.
it might be this https://facthacks.cr.yp.to/fermat.html

if N=p*q and p-q < sqrt(p) then its easy to factor

Encountering this by chance is exceedingly unlikely of course, if p and q are randomly generated. In probability terms it amounts to the first half of p (or q) all being zero (apart from a leading 1) so roughly 2^(-n/4) where n is the bit size of n. So for RSA 2048 the probability of this happening is on the order of 2^-512, or in base-10 terms 0.0000000...0000001, with roughly 150 zeros before the one!
> Rule of thumb: Want a 1024-bit prime? Try 1024 1024-bit candidates and you'll probably find one.

Where probably is 76% [1], which is not that high depending on what you are doing. For example, you wouldn't be ok with GenerateKey failing 24% of the time.

To get a better than even chance, 491 [2] 1024-bit candidates are enough.

[1]: https://www.wolframalpha.com/input?i=1+-+%281+-+li%282%5E102... (using li(x) as a slightly better approximation of π(x) than x/ln(x), see [3])

[2]: https://www.wolframalpha.com/input?i=1+-+%281+-+li%282%5E102...

[3]: https://en.wikipedia.org/wiki/Prime-counting_function

Iterating over some huge search space in an essentially sequential manner is generally not going to be nearly performant as simply selecting an odd number at random. You could try using a generating polynomial instead such as f(x) = x^2 + x + 41 but even that isn't going to help much in the long run. (There are Diophantine equations which one day may prove useful for generating random primes however AFAICT finding efficient solutions is still currently considered a hard problem.)
Yes, but the more we mix sieve rejection into candidate selection the more we complicate the rule of thumb. "Reject even numbers as prime candidates" is probably OK to leave as an exercise for the reader, as is the equivalent "round every candidate to odd" optimization. The point about random vs sequential is well taken, though, and it doesn't complicate the rule of thumb, so I changed it.
Incrementing a bignum is faster than another PRNG cycle.
Neither is a significant amount of the time required to reject a candidate factor. The cheapest rejection test is "Bignum Division by 3" and something like 2/3 candidates will need more expensive further tests.

https://news.ycombinator.com/item?id=40093136