| HN Mirror

Codex just barely surpassed it on easy questions but did worse on harder ones. AlphaCode is significantly better on harder questions, but significantly worse on easy questions. That isn't extremely fast development, they are mostly moving sideways, trying to improve one part of the metric hurts the others.

https://paperswithcode.com/sota/code-generation-on-apps

Development in these areas was very fast in the 3 years between transformer networks were invented and roughly GPT 3 was done. But in the 3 years since GPT-3 not much has changed, we see a lot of "we applied a large network to a new problem and found X" since then, but that isn't new performance, its just a new result with the same thing we had around back then.