Hacker News new | ask | show | jobs
by porridgeraisin 1 day ago
RLVR still does not expand beyond the base distribution though, it only mode-seeks within it.

i.e, evaluation, retention yes. variation or "planning" no.

That is not to say you cannot use LLMs. Alpha evolve does exactly that. It uses an external simple evolutionary planner. The overarching point he's making is that our planner is still "dumb" and we need to work on it.

When you iteratively guide an LLM in claude code, you are the external planner. That also works.