Hacker News new | ask | show | jobs
by boroboro4 534 days ago
Because INT4 quantized weights still use FP16 compute in most cases. Sometimes it's possible to use FP8/INT8 compute, and there is research to use INT4 compute, but it's rather rare.