Hacker News new | ask | show | jobs
by crthpl 35 days ago
Yes it can! That's the whole point of RL! it generates slightly out of distribution rollouts, and rewards good rollouts to change the distribution of the output
1 comments

That's not out of distributíon, that's inside the distribution of the rollout. If you don't create rollouts for the game of Chess then it doesn't know how to play Chess no matter how smart it is at tasks you've created rollouts for. It's structurally stuck in its distribution.