Hacker News new | ask | show | jobs
by deepsquirrelnet 989 days ago
What advantage are vector databases providing above using an index in conjunction with a mature database? I’m not sold on this as a separate technology.

Vector search is useful, but I don’t understand why I would go out of my way when I could implement FAISS or HNSWlib as an adjunct to postgres or a document store.

3 comments

Vector extensions to your current database or search engine makes far more sense than adding yet another dependency to manage and operate. The vector database folks will have to become a real database or full featured search engine to survive and compete with the incumbents that will all have good solutions for vector similarity search.
The thing is if you need a vector _database_ there is no reason why it can't be a pg extensions. And if you project is only small scale there is probably some HNSW pg extension library you could use.

But what is most times needed instead of a vector database is a efficient fast responsive vectore approximate KNN search system with fast attribute filtering which overlaps with a fast an efficient text search system (e.g. bm25 based)

And if you then go to billion vector scale things become tricky performance wise.

And then you reach the same point at which companies do things like using warehouse approach where you have a read only extremely read optimized mostly in memory variant of their db they access for searches only and changes from their main db a streamed to the read only search instance, potentially while losing snapshot views, transactions and similar.

You could say that approx. KNN vector search is the new must have feature for unstructured fuzzy text search, and while you can have unstructured fuzzy text search in pg it's also often not the go-to solution if your databse is just for getting that search.

Why is text search so related to vector search by your opinion?
because any production use case I'm aware of sooner or later uses both searches and combined the results

e.g. vector search is fundamentally terrible at finding keywords, but keywords search is fundamentally terrible at finding equal things which use slightly different words

If you're interested in an approach like this, take a look at txtai.

1. https://neuml.github.io/txtai/embeddings/indexing/

2. https://neuml.hashnode.dev/external-database-integration

I love this idea. It seems like a very practical approach. I'm going to give this a try on my next project.
It's practical and simple. This approach just plugs the index id similarity matches into the RDBMS query.