Hacker News new | ask | show | jobs
by interloxia 912 days ago
Consider using fastText's word vectors. They have a bunch of languages that come pre sorted by frequency and are sufficient for basic word sense. Perhaps use a LLm to automate some of the disambiguation.

https://fasttext.cc/docs/en/crawl-vectors.html

https://news.ycombinator.com/item?id=13771292 (6 years ago)

Aligning the fastText vectors of 78 languages

https://github.com/babylonhealth/fastText_multilingual/blob/...

1 comments

Thanks, I look into these.