Hacker News new | ask | show | jobs
by leyoDeLionKin 755 days ago
but y not just a vector database like pgvector?
5 comments

In practice, a combination of full text and vector databases often gives superior performance than just one of the types. It's called hybrid search. Here's an article that talks a bit about this: https://opster.com/guides/opensearch/opensearch-machine-lear...

Often you take the results from both vector search and lexical search and merge them through algorithms like Reciprocal Rank Fusion.

You can think of a full-text index as being like a vector database that's highly specialized and optimized for the use-case where your documents and queries are both represented as "bags of words", i.e. very high-dimensional and very sparse.

Which works great when you want to retrieve documents that actually contain the specific keywords in your search query, as opposed to using embeddings to find something roughly in the same semantic ballpark.

Check https://github.com/infiniflow/infinity which combines vector search and full-text search providing extremely fast search performance.
Infinity looks interesting, but I don't see any mention of support for clustering.
Infinity supports HNSW vector index.
Vector databases are good for documents, but if you have a fact database or some other more succinct information store, it's quite slow to retrieve compared to trigram/full text while often performing worse.
Because it’s a full text search engine, and not a text embedding? Different query types, requirements, indexing methods, etc.