|
|
|
|
|
by jmmcd
53 days ago
|
|
> Since these companies can’t improve their AI models without fresh data created by human beings Totally wrong. Self-play dates back to Arthur Samuel in the 1950s and RL with verifiable rewards is a key part of training the most advanced models today. |
|
Right now there are companies which hire software devs or data scientists to just solve a bunch of random problems so that they can generate training data for an LLM model. Why would they be in business if self play can work out so well?