|
|
|
|
|
by lappa
1010 days ago
|
|
Interesting how this method quantizes different layers / modules in a manner that minimizes perplexity as it adjusts parameters. I'd be interested to see how 2.5 bit quantization compares to an unadjusted 4-bit baseline. Additionally would be interesting to see the usual benchmarks (ARC, HellaSwag, MMLU, TruthfulQA) of this method at different average bitrates (2.0, 2.5, 3.0, 4.0, ...). Would also be interesting to see if an average bitrate of 4 is just as fast and small as a constant bitrate of 4, but more accurate. Very exciting work, looking forward to trying this out on my models! |
|