Hacker News new | ask | show | jobs
by rvz 472 days ago
Note: What makes this interesting is that this is a pre-LLM project which shows that in some projects you don't need an "LLM" for this. All you need is just a plain old reinforcement learning algorithm and a deep neural network which is perfect for this.

This is what I want to see more of and goes against the hype of LLMs. What a great RL project.

Meanwhile, "Claude" is still stuck somewhere in the game. Imagine the costs of running that vs this project.

1 comments

Claude 3.7 recently failed to finish Pokemon after getting stuck in a corner and deciding it was impossible to get out
not our agents a hierarchical approach would be superior. add rl to claude and it's gg