| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by rvz 472 days ago

Note: What makes this interesting is that this is a pre-LLM project which shows that in some projects you don't need an "LLM" for this. All you need is just a plain old reinforcement learning algorithm and a deep neural network which is perfect for this.

This is what I want to see more of and goes against the hype of LLMs. What a great RL project.

Meanwhile, "Claude" is still stuck somewhere in the game. Imagine the costs of running that vs this project.

1 comments

mclau156 472 days ago

Claude 3.7 recently failed to finish Pokemon after getting stuck in a corner and deciding it was impossible to get out

link

xinpw8 472 days ago

not our agents a hierarchical approach would be superior. add rl to claude and it's gg

link