Hacker News new | ask | show | jobs
by ignoramous 697 days ago
Mistral has published large language models, not embedding models? sgrep uses Google's Word2Vec to generate embeddings of the corpus and perform similarity searches on it, given a user query.
1 comments

No I got that I asked because wouldn’t embedding generated by fine tuned transformer based LLMs be more context aware? Idk much about the internals so apologies if this was a dumb thing to say
embeddings come in handy to augment LLMs [0], but as you suspect, some try LLMs themselves as an outright embedding model with varying degrees of success: https://www.reddit.com/r/LocalLLaMA/comments/12y3stx/embeddi... / https://huggingface.co/spaces/mteb/leaderboard

[0] https://simonwillison.net/2023/Oct/23/embeddings/