| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by kacperlukawski 1114 days ago
	If you need semantic search locally then it's fine, but serving an embedding model might be still challenging. And if you want to expose it, your laptop might be not enough.

3 comments

hn_20591249 1114 days ago

I've hosted embedding models on AWS Lambda (fair that this is a vendor, but 1 vs. 3), if you try an LLM with 1B+ parameters you will struggle, but if the difference between a light-weight BERT-like transformer and an LLM is only a few % of loss, why bother getting your credit card out?

Edit: another thought, skip lambda entirely and run the embedding job on the server as a background process, and use an on-disk vector store (lancedb)

link

binarymax 1114 days ago

Shameless plug: I built Mighty Inference Server to solve this problem. Fast embeddings with minimal footprint. Better BEIR and MTEB scores using the lightning fast and small E5 V2 models. Scales linearly on CPU, no GPU needed.

https://max.io

link

llogiq 1114 days ago

The initial version of this actually used Mighty, but I didn't find any free tier available, so I switched to Cohere to keep the $0 pricetag.

link

binarymax 1114 days ago

Mighty is free if you're not making money from it. You could have used Mighty and I would have been glad to help you set it up :)

link

kybernetikos 1114 days ago

There's a bit of a difference between what you see following the 'purchase' link and what you see if you scroll down to 'pricing' on your site. It confused me at first too - I'm just so used to seeing a 'pricing' link in the top bar, I pretty much always go there first to see if there's a reasonable free tier for me to play with something.

link

binarymax 1114 days ago

Thanks for the feedback! I'll do my best to make things more clear.

link

bootsmann 1114 days ago

You serve the embedding model in a lambda and then run something like FAISS in the backend.

link