|
|
|
|
|
by gajjanag
47 days ago
|
|
There are also more papers on similar themes. For example, TurboQuant makes use of QJL (quantized Johnson Lindenstrauss transformations). One of the first papers to characterize the QJL and in fact the rate distortion tradeoff for quantized matrix multiplication in general is "Optimal Quantization for Matrix Multiplication" (https://arxiv.org/abs/2410.13780) by Ordentlich and Polyanskiy. There is also a more accessible survey paper around quantized matrix multiplication called "High-Rate Quantized Matrix Multiplication: Theory and Practice" (https://arxiv.org/abs/2601.17187), by the same authors. TurboQuant cites none of them. |
|
The attribution is thin, the “6x compression” headline is not clearly separated from prior KV-cache quantization baselines like KIVI, and the RaBitQ comparison is hard to take seriously: single-core CPU for the baseline, A100 GPU for TurboQuant. It is comparing apples-to-datacenter. Worse, there are also public OpenReview comments saying that even the reported accuracy results are not reproducible.
Hard to believe this is the standard for something being promoted as a breakthrough. If this came from a random startup blog, people would be much harsher about it.