| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by qq99 445 days ago
	I was wondering about this. I was hesitant to add embedding-based search to my app because I didn't want to incur the latency to the embedding API provider blocking every search on initial render. Granted, you can cache the embeddings for common searches. OTOH, I also don't want to render something without them, perform the embedding async, and then have to reify the results list once the embedding arrives. Seems hard to sensibly do that from a UX perspective. To render locally, you need access to the model right? I just wonder how good those embeddings will be compared to those from OpenAI/Google/etc in terms of semantic search. I do like the free/instant aspect though

1 comments

jasonjmcghee 445 days ago

checkout MTEB (https://huggingface.co/spaces/mteb/leaderboard) many of the open source ones are actually _better_.

I've had a particularly good experiences with nomic, bge, gte, and all-MiniLM-L6-v2. All are hundreds of MB (except all-minilm which is like 87MB)

link

simonw 445 days ago

I love all-MiniLM-L6-v2 - 87MB is tiny enough that you could just load it into RAM in a web application process on a small VM. From my experiments with it the results are Good Enough for a lot of purposes. https://simonwillison.net/2023/Sep/4/llm-embeddings/#embeddi...

link

kaycebasques 444 days ago

87MB is still quite big, though. Think of all the comments here on HN where people were appalled at a certain site loading 10-50 MB of images. Hopefully browser vendors will figure out a secure way to download a model once and re-use that single model on any website that requests it. Rather than potentially downloading a separate instance of all-MiniLM-L6-v2 for each site. I know that Chrome has an AI initiative but I didn't see any docs about this particular problem: https://developer.chrome.com/docs/ai

link

jasonjmcghee 444 days ago

It's crazy because chrome ships an embedding model, it's just not accessible to users / developers (afaik)

https://dejan.ai/blog/chromes-new-embedding-model/

link

Ey7NFZ3P0nzAe 444 days ago

Personaly I hate it because it has a very short context length and *silently* crops the text after a tweet like text size. Inve been on a crusade about this on github and nobody seems to know this.

My go to right now is on ollama: snowflake-arctic-embed2

link