Hacker News new | ask | show | jobs
by marinhero 974 days ago
How well do LLMS like this work with a non-English language? Or are these open source models limited to English?
3 comments

Quite a few of the top ranked models on this leaderboard are multilingual: https://huggingface.co/spaces/mteb/leaderboard

https://huggingface.co/BAAI/bge-large-en-v1.5 FlagEmbedding for example describes itself as covering Chinese and English.

Stability has a Japanese port which is getting lots of work https://twitter.com/StabilityAI_JP/status/171699857824440759...
This is not an embedding model though. Yes you can always extract some embeddings from somewhere, but for most LLMs those won't perform well for retrieval (which makes sense as it's not what the models are optimizing for)
This isn't an embedding model, but it is a group of people working in this general area in a language other than English. Maybe they'll get to an embedding model next?
That depends on whether the training data contained languages other than English.