| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by marinhero 974 days ago
	How well do LLMS like this work with a non-English language? Or are these open source models limited to English?

3 comments

simonw 974 days ago

Quite a few of the top ranked models on this leaderboard are multilingual: https://huggingface.co/spaces/mteb/leaderboard

https://huggingface.co/BAAI/bge-large-en-v1.5 FlagEmbedding for example describes itself as covering Chinese and English.

link

anigbrowl 973 days ago

Stability has a Japanese port which is getting lots of work https://twitter.com/StabilityAI_JP/status/171699857824440759...

link

m3at 973 days ago

This is not an embedding model though. Yes you can always extract some embeddings from somewhere, but for most LLMs those won't perform well for retrieval (which makes sense as it's not what the models are optimizing for)

link

anigbrowl 973 days ago

This isn't an embedding model, but it is a group of people working in this general area in a language other than English. Maybe they'll get to an embedding model next?

link

ttul 973 days ago

That depends on whether the training data contained languages other than English.

link