|
|
|
|
|
by naasking
83 days ago
|
|
> Learned rotations for INT4 are cool! Seems similar to SpinQuant? https://arxiv.org/abs/2405.16406 Indeed, but much better! More accurate, less time and space overhead, beats AWQ on almost every bench. I hope it becomes the standard. > In my personal opinion I don’t think the 1.58 bit work is going to make it into the mainstream. I hope you're wrong! I'm more optimistic. Definitely a bit more work to be done, but still very promising. > Being able to natively compute in lower precisions can be a huge performance boost at inference time. ParoQuant is barely worse than FP16. Any less precise fractional representation is going to be worse than just using that IMO. |
|