|
|
|
|
|
by jeremysalwen
168 days ago
|
|
As someone who implemented some RL algorithms and applied them to a real world game, (including all the ones mentioned in the article), I would be surprised if the implementation is not buggy. That is one of the most striking things about RL, the extent to which it is hard to find bugs, since they generally only degrade the performance instead of causing a crash or obviously wrong behavior. The fact that he doesn't mention a massive amount of time spent debugging, and the longish list of things that were tried that really should have worked but didn't, suggests to me it's probably still buggy. I suppose it is possible that LLMs could be particularly good at RL code since it's seen it repeated so many times... But I would be skeptical without hard evidence. |
|