|
|
|
|
|
by in-silico
36 days ago
|
|
What about things like AlphaZero and Atari gameplay, where the model has zero prior knowledge and learns superhuman ability purely using RL? With sufficient RL sampling/training, there's no reason an LLM couldn't similarly develop entirely new skills, especially in verifiable domains like math and code. > It simply alters the probabilities. Yes? What else would a learning system do besides alter its behavior? (and you can just sample with argmax or pseudo-randomly of you think probabilities are a problem) |
|