Hacker News new | ask | show | jobs
by janalsncm 756 days ago
I also don’t quite understand the value of embedding all languages into the same database. If I search for “dog” do I really need to see the same article 300 times?

As a first step they are using PQ anyways. It seems natural to just assume all English docs have the same centroid and search that subspace with hnswlib.

1 comments

It's split by language. TFA builds an index on the English language subset.
Ah, missed that.