Hacker News new | ask | show | jobs
by belval 1249 days ago
It's likely due to the corpus though. It's multilingual, but the dataset they trained on is representative of "the Internet" so the latin languages (English, French, Spanish, Italian, German, etc...) are overly represented.