|
|
|
|
|
by famouswaffles
1160 days ago
|
|
1. No it's not lol. If the model was only trained on that much data, it wouldn't be anywhere near as good in french. 1.8% is only enough here because it trained on other languages as well. GPT-3 is also fluent in languages with less training data. 3. LLMs trained on code score noticeably higher on reasoning benchmarks |
|
1.8% does look like a small number but imagine (i know its hard in this day and age with 4TB finger nail usb sticks) a physical library holding good old fashioned paper made artifacts and what does 1.8% of that looks like?