|
|
|
|
|
by int_19h
315 days ago
|
|
You can go below one byte per parameter. 4-bit quantization is fairly popular. It does affect quality - for some models more so than others - but, generally speaking, a 4-bit quantized model is still going to do significantly better than an 8-bit model with 1/2 parameters. |
|