AES is actually a good example of why this doesn’t work in cryptography. Implementing AES without a timing side channel in C is pretty much impossible. Each architecture requires specific and subtle constructions to ensure it executes in constant time. Newer algorithms are designed to not have this problem (DJB was actually the one who popularized this approach).
Okay, I should've said implementing AES in C without a timing sidechannel performantly enough to power TLS for a browser running on a shitty ARMv7 phone is basically impossible. Also if only Thomas Pornin can correctly implement your cipher without assembly, that's not a selling point.
I'm not contesting AES's success or saying it doesn't deserve it. I'm not even saying we should move off it (especially now that even most mobile processors have AES instructions). But nobody would put something like a S-Box in a cipher created today.
If your point is "reference implementations have never been sufficient for real-world implementations", I agree, strongly, but of course that cuts painfully across several of Bernstein's own arguments about the importance of issues in PQ reference implementations.
Part of this, though, is that it's also kind of an incoherent standard to hold reference implementations to. Science proceeds long after the standard is written! The best/safest possible implementation is bound to change.
I don't think it's incoherent. On one extreme you have web standards, where it's now commonplace to not finalize standards until they're implemented in multiple major browser engines. Some web-adjacent IETF standards also work like this (WebTransport over HTTP3 is one I've been implementing recently).
I'm not saying cryptography should necessarily work this way, but it's not an unworkable policy to have multiple projects implement a draft before settling on a standard.