|
|
|
|
|
by AnIrishDuck
412 days ago
|
|
> The LLM skeptics need to point out what differs with code compared to Chess, DoTA, etc from a RL perspective. An obviously correct automatable objective function? Programming can be generally described as converting a human-defined specification (often very, very rough and loose) into a bunch of precise text files. Sure, you can use proxies like compilation success / failure and unit tests for RL. But key gaps remain. I'm unaware of any objective function that can grade "do these tests match the intent behind this user request". Contrast with the automatically verifiable "is a player in checkmate on this board?" |
|
So, it doesn't map cleanly onto previously solved problems, even though there's a decent amount of overlap. But I'd like to add a question to this discussion:
- Can we design clever reward models that punish bad architectural choices, executing on unclear intent, etc? I'm sure there's scope beyond the naive "make code that maps input -> output", even if it requires heuristics or the like.