You Don't Need a Vector Database

Y	Hacker News new \| ask \| show \| jobs

	You Don't Need a Vector Database (vecstore.app)
	20 points by kencho 103 days ago

7 comments

exhost 103 days ago

> Please don't use HN primarily for promotion. It's ok to post your own stuff part of the time, but the primary use of the site should be for curiosity.

link

kenforthewin 103 days ago

If I understand correctly, the vecstore product is an api that wraps a vector database (among other features). So the pitch here is something like “you don’t need a vector database, you need an api wrapper to a vector database that we manage for you”.

link

EspadaV9 103 days ago

If you need to store vector, surely the easiest solution is to run `CREATE EXTENSION vector`. When, or if, you need more, then look at alternatives, but I'm sure most people will have a much easier time just adding it to their existing database.

link

canpan 103 days ago

Pgvector is definitely the easiest way to get started. I did not really get the problem the article was trying to sell.

Setting up an ingestion pipeline to your existing db, vs ingesting into yet another db seems to not solve a problem I have.

If there was one thing I wish pgvector was better at, it would be to allow composite indexes (ie find vector where category). But it's a minor point.

link

EspadaV9 103 days ago

I was genuinely surprised just how easy it was to get a fully working RAG set up with Postgres. It was a few hours over a weekend to get something "working" and then probably a bit less time a following weekend to have a nicer database structure and rebuild it learning from the mistakes during the first attempt. The harder part comes next, because that involves multiple tables of user provided data, multi tenancy with a shared core vector schema, and all the actual business logic, so I've put it all on hold for a real breakdown now, but I wouldn't expect it to be much of a problem with what I've found so far with pgvector, and Postgres in general.

link

kencho 103 days ago

the article was aimed more at teams that don't have an existing postgres setup and are evaluating standalone vector databases from scratch. if you're already running postgres with pgvector, you're in a good spot

link

SirHumphrey 103 days ago

I was asking myself the whole article “what does this mysterious semantic search api actually do?” and was a bit underwhelmed when the result came out to be - managed vector database.

link

wat10000 103 days ago

When they generated this post, they set their LLM to “long and obfuscated” instead of “concise and clear.”

link

kencho 103 days ago

you're completely wrong. we also added "make no mistakes" in the end

link

Raro 103 days ago

If this were on Mad Men:

"You don't need a hamburger... you need McDonald's".

link

kencho 103 days ago

fair point on the framing. we do use vector search internally, so calling it "not a vector database" is a stretch. the argument i was trying to make is more narrow: most teams evaluating pinecone or qdrant don't need to operate the vector layer themselves. they need search results from an api call. whether that api uses vectors, BM25, or hamsters under the hood shouldn't matter to them. i could have been clearer about that

link

_puk 103 days ago

So the classic build vs buy question

link

noemit 103 days ago

The Vector Database obsession came from RAG, which came from a marketing idea to calm down enterprise fears about hallucination with RAG. Will save this article because I feel like I have this conversation weekly when people think they need a vector database for something they definitely do not.

link

kencho 103 days ago

Exactly. We talk to teams every week who spent a month setting up pinecone or qdrant and then realize they just needed search that worked. The vector database became the default answer to every search problem because of the RAG hype cycle, even when the actual need is way simpler

link

le-mark 103 days ago

Can you describe when the actual need is much simpler? I mean throwing documents into elastic search is really easy and the search is really good.

link

kencho 103 days ago

one use case we're handling right now is for a large online auction marketplace. they needed to automatically categorize 40,000 newly uploaded images per week. no tags, no metadata from the sellers, just raw photos. elasticsearch can't look at an image and tell you it's a vintage rolex or a mid-century lamp. they needed search that understands visual content, not text

that's the kind of problem where keyword search doesn't apply at all, no matter how good the engine is

link

simonjgreen 103 days ago

> Spent a month setting up Pinecone? Really?

link

kencho 103 days ago

there's a lot more in "setting up" than creating an account and a collection on pinecone or any other service

link

alansaber 103 days ago

That RAG is marketing and doesn't significantly affect performance is incorrect. As to whether retrieval really benefits from vector DBs is another question.

link

noemit 102 days ago

True. RAG is worse in almost all real-world use cases. If you have less than 10,000 documents its worse and if you have too many documents its also worse.

link

stephantul 103 days ago

This is a thinly veiled commercial, not really useful.

link

hbogert 93 days ago

You don't just need a vector db, you need ours!

link

cpursley 103 days ago

Or you could just use the new BM25 extension and if thats not enough, bring in the vector extension, which you can run as hybrid and not have to bolt yet another paid 3rd party thing:

https://postgresisenough.dev/tools?category=search

link

kencho 103 days ago

pgvector is great if you're already on postgres and only need text search. funny enough, Neon Postgres actually featured us in a case study about this exact topic. we replaced pinecone and rds with neon under the hood: https://neon.com/blog/vecstore-replacing-pinecone-and-rds-wi...

the gap shows up when you need image search, face search, or content moderation on top of text search. that's where a dedicated api makes more sense than rolling your own on postgres

link

alansaber 103 days ago

Exactly this, hybrid search (weighted in favor of a good sparse retrieval strategy) is universally the best way to go.

link

sirfz 103 days ago

You don't need a vector db, you just need np.dot

link