Hacker News new | ask | show | jobs
by naasking 83 days ago
Newer quantization approaches are even better, 4-bits gets you no meaningful loss relative to FP16: https://github.com/z-lab/paroquant

Hopefully Microsoft keeps pushing BitNet too, so only "1.58" bits are needed.

I think fractional representations are only relevant for training at this point, and bf16 is sufficient, no need for fp4 and such.

1 comments

Learned rotations for INT4 are cool! Seems similar to SpinQuant? https://arxiv.org/abs/2405.16406

In my personal opinion I don’t think the 1.58 bit work is going to make it into the mainstream.

Not sure why you think fractional representations are only useful for training? Being able to natively compute in lower precisions can be a huge performance boost at inference time.

> Learned rotations for INT4 are cool! Seems similar to SpinQuant? https://arxiv.org/abs/2405.16406

Indeed, but much better! More accurate, less time and space overhead, beats AWQ on almost every bench. I hope it becomes the standard.

> In my personal opinion I don’t think the 1.58 bit work is going to make it into the mainstream.

I hope you're wrong! I'm more optimistic. Definitely a bit more work to be done, but still very promising.

> Being able to natively compute in lower precisions can be a huge performance boost at inference time.

ParoQuant is barely worse than FP16. Any less precise fractional representation is going to be worse than just using that IMO.