| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by jeremysalwen 168 days ago
	As someone who implemented some RL algorithms and applied them to a real world game, (including all the ones mentioned in the article), I would be surprised if the implementation is not buggy. That is one of the most striking things about RL, the extent to which it is hard to find bugs, since they generally only degrade the performance instead of causing a crash or obviously wrong behavior. The fact that he doesn't mention a massive amount of time spent debugging, and the longish list of things that were tried that really should have worked but didn't, suggests to me it's probably still buggy. I suppose it is possible that LLMs could be particularly good at RL code since it's seen it repeated so many times... But I would be skeptical without hard evidence.

1 comments

nkaz123 168 days ago

I accepted the bugginess in the browser game as unavoidable, and probably had too much faith in the LLM implementations, but I did a bit more troubleshooting than mentioned. The progressive improvement over episodes (and intuitively that PPO > the others) gave me some confidence, and I've since used a similar setup on 2048 with more results showing improvement over episode: https://wandb.ai/noahpunintended/2048-raspberry-pi?nw=nwuser...

link