|
|
|
|
|
by pbadams
1119 days ago
|
|
There are a lot of directions people try to go, making different tradeoffs in the complexity of the clustering, the loss from the quantization, the impact on performance (esp. trying to get some subset of the tables to fit in cache). Readers might be interested in [1], which gives a survey of some of the directions. In general though PQ is a pretty good baseline. I'm glad all these vector DB companies seem to have decided that the best form of marketing is high-quality summaries/tutorials about fundamental concepts, it's a good contribution to the community. [1] Fig. 1 in https://www.jstage.jst.go.jp/article/mta/6/1/6_2/_pdf (2018) |
|
As great as HNSW and other graph-based indexes are, I think PQ and other encoder/decoder-based methods are still incredibly important for ANN search in general. In particular, it should be possible to learn some sort of joint encoding with neural networks targeted towards different modalities.