| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by nxa 443 days ago

Thank you! I actually had a hard time finding prior work on this, so I appreciate the references.

The dictionary is based on https://wordnet.princeton.edu/, no word2vec. It's just a plain lookup among precomputed embeddings (with mxbai-embed-large). And yes, I'm excluding words that are present in the query because.

It would be interesting to see how other models perform. I tried one (forgot the name) that was focused on coding, and it didn't perform nearly as well (in terms of human joy from the results).

1 comments

kaycebasques 443 days ago

(Question for anyone) how could I go about replicating this with Gemini Embedding? Generate and store an embedding for every word in the dictionary?

link

nxa 443 days ago

Yes, that's pretty much what it is. Watch out for homographs.

link