|
|
|
|
|
by devit
946 days ago
|
|
Based on the first figure in the paper, it seems that this scheme effectively turns 8 input values into a 4-bit number, thus giving an effective 0.5-bit quantization. Considering that current aggressive quantization for LLM transformers uses 4 bits, does such a 0.5-bit quantization produce an effective neural network? Does the scheme stay competitive if it is changed to use 4-bit quantization instead of 0.5-bit? |
|
Also most scalar quantization methods use uniform quantization (e.g., divide the range between the scalar lower bound L and scalar upper bound H into N different regions where N is usually 2^bit_width), whereas PQ (and VQ) is learned quantization via k-means on some training vector set, so they're not really directly comparable.