|
|
|
|
|
by Majromax
539 days ago
|
|
> What good it's it to train a model basically on itself? If the model generates data of variable quality, and if there's a good way to distinguish good data from bad data, then training on self-generated data might "bootstrap" a model to better performance. This is common in reinforcement learning. Famously, AlphaGo Zero (https://en.wikipedia.org/wiki/AlphaGo_Zero) learned exclusively on self-play, without reference to human-played games. Of course, games have a built-in critic: the better strategy usually wins. It's much harder to judge the answer to a math problem, or decide which essay is more persuasive, or evaluate restaurant recommendations. |
|