This looks like a really phenomenal writeup, with worked examples. The underlying math and algorithm stuff here is applicable to other attacks; it looks like this is worth a close read.
Can you say whether libgcrypt did something especially stupid? [1]
I understand that most people just take the ref10 code from supercop, but if I were to try to implement ed25519 from the paper (https://ed25519.cr.yp.to/papers.html), what is the chance I would do something like what libgcrypt did, or equally bad?
Basically, is ed25519 secure because everyone uses a known secure implementation or because it is engineered to be genuinely hard to implement incorrectly from the paper? (I know it's both, but is it mostly one or the other?)
1 - They special cased the point at infinity and this short-circuit allowed to count leading zeros.
If my two cents count, I would say that if you were to implement EdDSA from the paper, you would have a good chance of creating a secure implementation w.r.t. to this kind of leakage.
However, if you were starting with some Short-Weierstrass EC code in your library, then you might be inclined to skip all the scalar multiplication specific stuff in the Ed25519 paper, just take some (incomplete) Edwards formulas, take some general scalar multiplication algo (or even reuse the one you have for Short-Weierstrass, like libgcrypt) and end up with a vulnerable EdDSA (if your ECDSA was).
The short-circuiting in the addition formulas is necessary if incomplete formulas are used. Either that is done, or the scalar multiplication algorithm has to explicitly find out the bit-length and start so that the point at infinity is not input into them ever.
In https://minerva.crocs.fi.muni.cz/#details-reasons and in your comment you're describing two bad ways to do scalarmult, but you do not show side-by-side the correct way (e.g. what ref10 does). Can you describe it in a sentence or two?
Right, there are many ways to do it correctly. In general, you need complete addition formulas (those that can take the point at infinity and produce a correct result in a side-channel indistinguishable way), if you have those, almost any scalar multiplication algo can be made constant time with regards to the scalar bit-length (you would just start at some fixed point past the end of the scalar in case of a left-to-right algo, with the point at infinity intialized).
One of the main points in the root causes discussion is that without complete formulas you are almost always going to leak the bit-length, so the only way to not be vulnerable in that case is to fix the bit-length to a constant value. This cannot be done naively by simply setting the high bit, because this would introduce bias in the nonces which would be exploitable even without measuring the duration, a much worse attack! However it can be done via the method suggested by Brumley & Tuveri in https://eprint.iacr.org/2011/232 where you add a multiple of the curve order to the scalar to fix its bit-length. This means the distribution of the nonce modulo the order remains the same (uniform, no bias) yet the bit-length used in scalarmult as a loop bound is constant.
Thanks, a paper is being prepared with the full details and an improved method. The sensitivity of the method to noise (one bad inequality in the lattice can make it not find the key) is really worth looking at, as that would improve the number of signatures necessary for the attack severely.
I understand that most people just take the ref10 code from supercop, but if I were to try to implement ed25519 from the paper (https://ed25519.cr.yp.to/papers.html), what is the chance I would do something like what libgcrypt did, or equally bad?
Basically, is ed25519 secure because everyone uses a known secure implementation or because it is engineered to be genuinely hard to implement incorrectly from the paper? (I know it's both, but is it mostly one or the other?)
1 - They special cased the point at infinity and this short-circuit allowed to count leading zeros.