Hacker News new | ask | show | jobs
by famouswaffles 1160 days ago
1. No it's not lol. If the model was only trained on that much data, it wouldn't be anywhere near as good in french. 1.8% is only enough here because it trained on other languages as well.

GPT-3 is also fluent in languages with less training data.

3. LLMs trained on code score noticeably higher on reasoning benchmarks

1 comments

lol?

1.8% does look like a small number but imagine (i know its hard in this day and age with 4TB finger nail usb sticks) a physical library holding good old fashioned paper made artifacts and what does 1.8% of that looks like?