|
|
|
|
|
by ck2
1247 days ago
|
|
Datasets. The one with the largest, most personal, most obtrusive, invasive dataset will probably win. The one that has absorbed every podcast, every youtube video, every close-caption text in existence, will have the most "complete" answers. |
|
What is going to make a difference is running models to generate more text for training, because relying on humans alone doesn't scale. For example we could be using LLMs to do brute force problem solving and then fine-tuning on solutions.
AlphaZero is the shining example of a model trained on its own generated data and surpassing us at our own game. The self generated data approach has potential to reach super human levels of performance.