|
|
|
|
|
by keeda
9 days ago
|
|
> Second, clean data. MAI-Thinking-1 was trained on clean and appropriately licensed data, with AI-generated content excluded from pre-training. This matters for quality, provenance, and control. If we cannot account for what shaped a model, we cannot fully understand its behavior or credibly improve it. Shots fired? It would be interesting to see how far "clean data" can go on the scaling laws. |
|
P.S. A fairly basic website otherwise, but it unfortunately seems to be hacking scroll for no good reason.