|
|
|
|
|
by lstmemery
1768 days ago
|
|
I have to disagree with you here. In the Codex paper[1], they have two datasets that Codex got correct about 3% of the time. These are interview and code competition questions. From the paper: "Indeed, a strong student who completes an introductory computer science course is expected to be able to solve a larger fraction of problems than Codex-12B." This suggests to me that Codex really doesn't understand anything about the language beyond syntax. I have no doubt that future systems will improve on this benchmark, but they will likely take advantage of the AST and could use unit tests in a RL-like reward function. [1] https://arxiv.org/abs/2107.03374 |
|
In the end, a more general approach with more compute, always wins over applying domain knowledge like taking advantage of the AST. This is called “the bitter lesson”. http://www.incompleteideas.net/IncIdeas/BitterLesson.html