Hacker News new | ask | show | jobs
by storus 58 days ago
4-bit quantization is not applied to all layers, some are kept 8/16-bit.