|
|
|
|
|
by linolevan
106 days ago
|
|
There was this very interesting paper out of Stanford this last September about pretraining under the unlimited compute but limited data paradigm[0]. Pretty much exactly the same thing but with ~200M training tokens instead. [0] https://www.alphaxiv.org/abs/2509.14786 |
|