Hacker News new | ask | show | jobs
by curioussquirrel 60 days ago
One more thing: we're working on a multilingual benchmark that will evaluate core linguistic proficiency in 30 languages. We already have a lot of data internally and I can tell you that:

- Gemini 3 Pro is a multilingual monster.

- GPT-5.4 is a really good translation model, big improvements over previous subversions in the 5 family.

- Opus 4.6 is good but usually third place.

- Somehow, Grok 4.20 is surprisingly good at some long-tail languages? Its performance profile is really odd. Unlike all the other models.

EDIT: layout