Hacker News new | ask | show | jobs
by aarnphm 1096 days ago
Currently on main, 8bit and 4bit quant is supported

One can simply do

```openllm start falcon --model-id tiiuae/falcon-40b-instruct --quantize int4```

Beware that there is no free lunch, meaning the quality of inference will degrade by alot when using int 4 quantization