Hacker News new | ask | show | jobs
by jhj 1144 days ago
Regarding reduced precision, depending upon what you are trying to do, I think it doesn't work quite as well in similarity search as it does for, say, neural networks.

If you are concerned about recall of the true nearest neighbor (k=1), in many datasets I've seen (especially large ones) in float32 the distance from a query vector to candidate nearest neighbor vectors may only differ by some thousands of ULPs when performing brute force search, which if done in float16 would result in the true nearest neighbor being the same as (or perhaps behind even, due to rounding error) other proximate vectors. If you are performing approximate lookup and you have the luxury of performing reranking (you store the compressed / approximate index for lookup, but return a larger candidate set like k=100 or k=1000 and refine the results based on true distances computed from the uncompressed vectors via brute-force, so you have to keep all the original vectors around) then this problem can go away.

If however you are looking at recall@100 (is the true nearest neighbor reported within the top k=100) or set intersection (of the k=100 approximate nearest neighbors, how much overlap is there with the true set of the 100 nearest neighbors), then this doesn't matter as much.

Certainly a lot of the options in the Faiss library are geared towards compression and quantization anyways (e.g., storing a billion high-dimensional vector dataset in 8/16 GB of memory) so this is a tradeoff as with everything else.

In Faiss there are the scalar quantizer indexes which do store vectors in int4/int8 etc for which int8 GEMM (accumulate in int32) would be great, but using this would require that the query vector itself be quantized to int4/int8 as well. This is the difference between "asymmetric distance computation" (ADC) where you compute the distance between int4 encoded vectors (or product quantized encoded vectors, etc) versus a float32 query vector, where we reconstruct the database vectors in floating point and compare in floating point, versus symmetric distance computation (you have to convert the query vector to int4 say, and compare in the quantized regime). ADC tends to work a lot better than symmetric computation, so this is why we don't use pure int8 GEMM, but maybe in many applications (NN inference, say, instead of image database search) the non-ADC comparison would be ok.

1 comments

Thanks for the very helpful and detailed reply!