|
|
|
|
|
by dcre
140 days ago
|
|
This is dead wrong: essentially the entirety of the huge gains in coding performance in the past year have come from RL, not from new sources of training data. I echo the other commenters that proprietary code isn’t any better, plus it doesn’t matter because when you use LLMs to work on proprietary code, it has the code right there. |
|
The quality of the existing code base makes a huge difference. On a recent greenfield effort, Claude emitted an MVP that matched the design semantics, but the code was not up to standards. For example, it repeatedly loaded a large file into memory in different areas where it was needed (rather than loading once and passing a reference.)
However, after an early refactor, the subsequently generated code vastly improved. It honors the testing and performance paradigms, and it's so clean there's nothing for the linter to do.