|
|
|
|
|
by andai
642 days ago
|
|
Look into Microsoft's Phi papers. The whole idea here is that if you train models on higher quality data (i.e. textbooks instead of blogspam) you get higher quality results. The exact training is proprietary but they seem to use a lot of GPT-4 generated training data. On that note... I've often wondered if broad memorization of trivia is really a sensible use of precious neurons. It seems like a system trained on a narrower range of high quality inputs would be much more useful (to me) than one that memorized billions of things I have no interest in. At least at the small model scale, the general knowledge aspect seems to be very unreliable anyways -- so why not throw it out entirely? |
|