| The transformer models handle multilingual directly. For good old embedding models (eg. GLoVe), you have a few choices: 1. LASER as you mentioned. The performance tends to suck though. 2. Language prediction + one embedding model per supported language. Libraries like whichlang make this nice, and MUSE has embedding modes aligned per language for 100ish languages. Fastembed is a good library for this. Note that for most people, 32 dimension glove is all they need if you benchmark it. As the length of the text you're embedding goes up, or as the specificity goes up (eg. You have only medical documents and want difference between them) you'll need richer embeddings (more dimensions, or a transformer model, or both) People never benchmark their embeddings and I find it incredible how they end up with needlessly overenginneered systems. |