Hacker News new | ask | show | jobs
by nisten 606 days ago
It's pretty interesting that the new SpinQuant method did not manage to be better than good old nf4bit QLORA training (Tim Dettmers really cooked with that one).

Really appreciate that Meta published both results+model quants and didn't just make some bs claim about a new sota quant like most other bigger companies would've done.

6 comments

Aside from the weirdness of calling "good old" something that was released 17 months ago :-D I mean, deep learning is evolving at crazy rhythm, but you just can't assume a good paper gets written in days.

That said, as others have pointed out, and as it's also written on the blog post, they are entirely different methods. QLoRA requires access to the full training data, while theoretically you can apply SpinQuant to any given model. For example, they also apply it to Mistral, not only to their LLaMA.

(QLoRA also takes some time and compute to apply, but since SpinQuant also implies learning some weights, I don't know if it's actually faster/cheaper, too)

It’s a little bizarre that I feel like I’m actually starting to respect this little bit of Meta…
I think meta and facebook before it have always valued a very high standard of engineering, and have also been generally pretty good about open sourcing a lot of that work in a way that allows a lot of people to work with their tools. This doesn’t seem all that out of character.
It's a huge company with a lot of different voices. One may create react and open source it while another would add a clause that if you sue facebook over anything your react license disappears. When they are good they are really good.
The naming is unfortunate but in this blog QLoRA is referring to Quantization-Aware Training with LoRA adaptor
I think the benefit is that SpinQuant had higher throughput and required less memory. At least according to the tables at the bottom of the article.

Definitely nice to see them not cherrypick results - makes them more believable that its not the best along all axes.

Those are different approaches afaict.
I mean, it's no free lunch, you still need to expend significantly more compute for the QLoRA training compared to any usual PTQ method, be it SpinQuant or any other more conventional quantization approaches.