|
|
|
|
|
by aarnphm
1096 days ago
|
|
Currently on main, 8bit and 4bit quant is supported One can simply do ```openllm start falcon --model-id tiiuae/falcon-40b-instruct --quantize int4``` Beware that there is no free lunch, meaning the quality of inference will degrade by alot when using int 4 quantization |
|