| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by hn_20591249 1079 days ago
	Am I being unreasonable to find it bizarre that the tutorial begins with subscribing to 3 different SAAS vendors? Especially seeing these days you can run a vector store on-disk if you have less than 10 million records, pull any free embedding model straight from HuggingFace and run on consumer hardware (your laptop).

2 comments

WinLychee 1079 days ago

Doesn't match how things are run in production these days. As a vendor, you need to target the customer's environment as closely as possible. Even if it's theoretically feasible to serve off a single machine, you should have a cloud-native setup ready to go.

In principle you could totally run this on a single bare-metal node, but most will not be doing that in practice.

link

hn_20591249 1079 days ago

> you should have a cloud-native setup ready to go

why is storing the file as a FAISS/LanceDB on-disk vector store not "cloud native"? I am running this setup in production across dozens of nodes, we migrated all of our infrastructure off Pinecone towards this solution and have seen 10x drop in latency, and the cost improvements have been dramatic (from paid, to totally free).

I have a bit of an axe to grind in the vector DB space, it feels like the industry has gaslit developers over the last year or so into thinking SAAS is necessary for vector retrieval, when low latency on-disk KNN across vectors is a solved problem.

link

llogiq 1079 days ago

I totally agree that latency of this solution leaves a lot of room to improvement. But that's totally besides the point of the article, which is that people can get a no-cost semantic search for their personal website using those services. They can also use other solutions, of course.

Also I'm experimenting in further integrating things to reduce latency and most likely will publish another article within the month. Stay tuned.

Finally I somewhat agree that many of the players in the vector DB space try to push their cloud offerings. Which is fine, how else should they make money? And if latency matters that much to you, Qdrant offers custom deployments, too. I believe running Qdrant locally will handily beat your LanceDB solution perf-wise unless you're talking about less than 100k entries. We have both docker containers and release binaries for all major OSes, why not give it a try?

link

WinLychee 1079 days ago

That's fantastic! Not all organizations (arguably most) are running their tech/infrastructure so well and competently. For a lot of organizations, it makes sense to externalize anything that's not a core competency directly related to their business. For them, less infra and less code is "better". Depending on how the accounting is done it might also be better to have a "vendor" expense rather than "internal team" expense which requires staffing.

All that is to say, maybe there's a lot of money in the SAAS/big cloud space, and customers willing to run their own setup that requires tuning might not be willing to hand them large sums of money? Just theorizing here!

Oh also "cloud native" is like a marketing term vaguely saying "you can hook this into other cloud stuff" and it works with K8s/whatever cloud thingy.

link

kacperlukawski 1079 days ago

If you need semantic search locally then it's fine, but serving an embedding model might be still challenging. And if you want to expose it, your laptop might be not enough.

link

hn_20591249 1079 days ago

I've hosted embedding models on AWS Lambda (fair that this is a vendor, but 1 vs. 3), if you try an LLM with 1B+ parameters you will struggle, but if the difference between a light-weight BERT-like transformer and an LLM is only a few % of loss, why bother getting your credit card out?

Edit: another thought, skip lambda entirely and run the embedding job on the server as a background process, and use an on-disk vector store (lancedb)

link

binarymax 1079 days ago

Shameless plug: I built Mighty Inference Server to solve this problem. Fast embeddings with minimal footprint. Better BEIR and MTEB scores using the lightning fast and small E5 V2 models. Scales linearly on CPU, no GPU needed.

https://max.io

link

llogiq 1079 days ago

The initial version of this actually used Mighty, but I didn't find any free tier available, so I switched to Cohere to keep the $0 pricetag.

link

binarymax 1079 days ago

Mighty is free if you're not making money from it. You could have used Mighty and I would have been glad to help you set it up :)

link

kybernetikos 1079 days ago

There's a bit of a difference between what you see following the 'purchase' link and what you see if you scroll down to 'pricing' on your site. It confused me at first too - I'm just so used to seeing a 'pricing' link in the top bar, I pretty much always go there first to see if there's a reasonable free tier for me to play with something.

link

binarymax 1079 days ago

Thanks for the feedback! I'll do my best to make things more clear.

link

bootsmann 1079 days ago

You serve the embedding model in a lambda and then run something like FAISS in the backend.

link