| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by aarnphm 1096 days ago

Currently on main, 8bit and 4bit quant is supported

One can simply do

```openllm start falcon --model-id tiiuae/falcon-40b-instruct --quantize int4```

Beware that there is no free lunch, meaning the quality of inference will degrade by alot when using int 4 quantization