|
|
|
|
|
by McBainiel
642 days ago
|
|
> Microsoft used LLMs to write millions of short stories and textbooks in which one thing builds on another. The result of training on this text, Bubeck says, is a model that fits on a mobile phone but has the power of the initial 2022 version of ChatGPT. I thought training LLMs on content created by LLMs was ill-advised but this would suggest otherwise |
|
The exact training is proprietary but they seem to use a lot of GPT-4 generated training data.
On that note... I've often wondered if broad memorization of trivia is really a sensible use of precious neurons. It seems like a system trained on a narrower range of high quality inputs would be much more useful (to me) than one that memorized billions of things I have no interest in.
At least at the small model scale, the general knowledge aspect seems to be very unreliable anyways -- so why not throw it out entirely?