| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by panarky 994 days ago

I'd love to know how vector databases compare in their ability to do hybrid queries, vector similarity filtered by metadata values. For example, find the 100 items with the closest cosine similarity where genre = jazz and publication date between 1990 and 2000.

Can the vector index operate on a subset of records? Or when searching for 100 closest matches does the database have to find 1000 matches and then apply the metadata filter, and hope that doesn't reduce the result set down to zero and exclude relevant vectors?

It seems like measuring precision and recall for hybrid queries would be illuminating.

5 comments

andre-z 994 days ago

There is on-stage filtering approach with extended HNSW https://qdrant.tech/articles/filtrable-hnsw/

link

mvcalder 994 days ago

I can't speak to the others, but pgvector indices can "break" hybrid queries. For example, if you select using a where clause specifying metadata (where genre = jazz) and order by distance from a vector (embedding of sound clip); if the index doesn't have a lot (or any) vectors in the sphere of the query vector that also match the metadata it can return no results. I discuss this in a blog post here [1].

[1]: https://www.polyscale.ai/blog/pgvector-bigger-boat/

link

prabhatjha 993 days ago

You can totally do this in Cassandra. See https://docs.datastax.com/en/astra-serverless/docs/vector-se...

link

gk1 993 days ago

What you’re describing is easily done in Pinecone, and in other solutions as well. See: https://docs.pinecone.io/docs/metadata-filtering

link

mistrial9 994 days ago

> do hybrid queries

"no" - the graph objects after training are opaque AFAIK

link

hobs 994 days ago

Actually a lot of the databases offer filtering before or after similarity search.

link

esafak 994 days ago

I'd say it's table stakes today.

link