|
|
|
|
|
by energy123
409 days ago
|
|
This is my view. We've seen this before in other problems where there's an on-hand automatic verifier. The nature of the problem mirrors previously solved problems. The LLM skeptics need to point out what differs with code compared to Chess, DoTA, etc from a RL perspective. I don't believe they can. Until they can, I'm going to assume that LLMs will soon be better than any living human at writing good code. |
|
An obviously correct automatable objective function? Programming can be generally described as converting a human-defined specification (often very, very rough and loose) into a bunch of precise text files.
Sure, you can use proxies like compilation success / failure and unit tests for RL. But key gaps remain. I'm unaware of any objective function that can grade "do these tests match the intent behind this user request".
Contrast with the automatically verifiable "is a player in checkmate on this board?"