|
|
|
|
|
by jhj
892 days ago
|
|
I don't know why PQ is listed as an "indexing strategy". It's a vector compression/quantization technique, not a means of partitioning the search space. You could encode vectors with PQ when using brute-force/flat index, an IVF index, with HNSW (all of which are present in Faiss with PQ encoding as IndexPQ, IndexIVFPQ and IndexHNSWPQ respectively), or even k-D trees or ANNOY if someone wanted to do that. "Use HNSW or Annoy for very large datasets where query speed is more important than precision": Graph-based methods have huge memory overhead and construction cost, and they aren't practical for billion-scale datasets. Also, they will usually be more accurate and faster than IVF techniques (as you would need to visit a large number of IVF cells to get comparable accuracy), though IVF can scale to trillion-sized databases without much overhead yet with reasonable speed/accuracy tradeoffs unlike other techniques. I'd say "use for medium-scale datasets where query speed is important, yet high accuracy is still desired and flat/brute-force indexing is impractical". |
|