|
|
|
|
|
by woadwarrior01
82 days ago
|
|
This is a very well established idea. It's called dynamic quantization. Vary the quantization bit-width (or skip quantization altogether) on a layer by layer basis, using a calibration dataset. EvoPress is the first time that comes to my mind, when I think of dynamic quantization. https://arxiv.org/abs/2410.14649 |
|