|
|
|
|
|
by kevingadd
1161 days ago
|
|
"there's so much more English language for them to train on relative to most other languages" is an interesting assertion. There are billions of people on earth speaking languages other than English and they have access to the internet. Are you sure it's not just the case that we didn't scrape that data? Everyone has to choose what data to train on, you can't train against The Entire Internet, it's a limitless amount of data. But it becomes an intentional choice with consequences, like the 15.77x seen here. |
|
Isn't that exactly how OpenAI managed to 10x GPT 3.5 with GPT 4.0?