| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by deepsquirrelnet 989 days ago
	What advantage are vector databases providing above using an index in conjunction with a mature database? I’m not sold on this as a separate technology. Vector search is useful, but I don’t understand why I would go out of my way when I could implement FAISS or HNSWlib as an adjunct to postgres or a document store.

3 comments

spullara 989 days ago

Vector extensions to your current database or search engine makes far more sense than adding yet another dependency to manage and operate. The vector database folks will have to become a real database or full featured search engine to survive and compete with the incumbents that will all have good solutions for vector similarity search.

link

dathinab 989 days ago

The thing is if you need a vector _database_ there is no reason why it can't be a pg extensions. And if you project is only small scale there is probably some HNSW pg extension library you could use.

But what is most times needed instead of a vector database is a efficient fast responsive vectore approximate KNN search system with fast attribute filtering which overlaps with a fast an efficient text search system (e.g. bm25 based)

And if you then go to billion vector scale things become tricky performance wise.

And then you reach the same point at which companies do things like using warehouse approach where you have a read only extremely read optimized mostly in memory variant of their db they access for searches only and changes from their main db a streamed to the read only search instance, potentially while losing snapshot views, transactions and similar.

You could say that approx. KNN vector search is the new must have feature for unstructured fuzzy text search, and while you can have unstructured fuzzy text search in pg it's also often not the go-to solution if your databse is just for getting that search.

link

AYBABTME 989 days ago

Why is text search so related to vector search by your opinion?

link

dathinab 989 days ago

because any production use case I'm aware of sooner or later uses both searches and combined the results

e.g. vector search is fundamentally terrible at finding keywords, but keywords search is fundamentally terrible at finding equal things which use slightly different words

link

dmezzetti 989 days ago

If you're interested in an approach like this, take a look at txtai.

1. https://neuml.github.io/txtai/embeddings/indexing/

2. https://neuml.hashnode.dev/external-database-integration

link

deepsquirrelnet 989 days ago

I love this idea. It seems like a very practical approach. I'm going to give this a try on my next project.

link

dmezzetti 989 days ago

It's practical and simple. This approach just plugs the index id similarity matches into the RDBMS query.

link