| HN Mirror

The implementations that I am aware of, including the one in Faiss which I wrote (described in detail in https://arxiv.org/abs/1702.08734), do not index the vector based on its PQ encoding (e.g., in IVFPQ). The IVF cell chosen in which to put the vector is based on its pre-quantized (full precision floating point) representation. It would lose too much precision to perform all comparisons in the compressed space.

Also, distance comparisons are usually of the "ADC" (asymmetric distance comparison) form: the query vector is in full floating point format, and vectors, if quantized/compressed in the database, are effectively decompressed and compared in the full floating point domain. This is true even with PQ, as the distance between the query vector subspaces with each of the 2^n PQ codes for the same subspace are precomputed before comparison (and then the distance computation becomes a lookup-add based on the precomputed distance tables).

LSH techniques unlike PQ are more accurately described as an indexing technique, since the buckets into which vectors are placed are based on their encoding (via hashing/binarization) and are fully compared in that space.