| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by ignoramous 697 days ago
	Mistral has published large language models, not embedding models? sgrep uses Google's Word2Vec to generate embeddings of the corpus and perform similarity searches on it, given a user query.

1 comments

throwawaydummy 697 days ago

No I got that I asked because wouldn’t embedding generated by fine tuned transformer based LLMs be more context aware? Idk much about the internals so apologies if this was a dumb thing to say

link

ignoramous 697 days ago

embeddings come in handy to augment LLMs [0], but as you suspect, some try LLMs themselves as an outright embedding model with varying degrees of success: https://www.reddit.com/r/LocalLLaMA/comments/12y3stx/embeddi... / https://huggingface.co/spaces/mteb/leaderboard

[0] https://simonwillison.net/2023/Oct/23/embeddings/

link