| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by in-silico 83 days ago

What about things like AlphaZero and Atari gameplay, where the model has zero prior knowledge and learns superhuman ability purely using RL?

With sufficient RL sampling/training, there's no reason an LLM couldn't similarly develop entirely new skills, especially in verifiable domains like math and code.

> It simply alters the probabilities.

Yes? What else would a learning system do besides alter its behavior? (and you can just sample with argmax or pseudo-randomly of you think probabilities are a problem)