| > LoRA does exactly the same thing as normal fine-tuning You wrote exactly so I'm going to say "no". To clarify what I mean: LoRA seeks to accomplish a similar goal as "vanilla" fine-tuning but with a different method (freezing existing model weights while adding adapter matrices that get added to the original). LoRA isn't exactly the same mathematically either; it is a low-rank approximation (as you know). > LoRA doesn't add "isolated subnetworks" If you think charitably, the author is right. LoRA weights are isolated in the sense that they are separate from the base model. See e.g. https://www.vellum.ai/blog/how-we-reduced-cost-of-a-fine-tun... "The end result is we now have a small adapter that can be added to the base model to achieve high performance on the target task. Swapping only the LoRA weights instead of all parameters allows cheaper switching between tasks. Multiple customized models can be created on one GPU and swapped in and out easily." > you can merge your LoRA adapter into the original weights (by doing "W = W_{0} + ∆W") which most people do Yes, one can do that. But on what basis do you say that "most people do"? Without having collected a sample of usage myself, I would just say this: there are many good reasons to not merge (e.g. see link above): less storage space if you have multiple adapters, easier to swap. On the other hand, if the extra adapter slows inference unacceptably, then don't. > This highlights to me that the author doesn't know what they're talking about. It seems to me you are being some combination of: uncharitable, overlooking another valid way of reading the text, being too quick to judge. |
No, the author is objectively wrong. Let me quote the article and clarify myself:
> Fine-tuning advanced LLMs isn’t knowledge injection — it’s destructive overwriting. [...] When you fine-tune, you risk erasing valuable existing patterns, leading to unexpected and problematic downstream effects. [...] Instead, use modular methods like [...] adapters.
This is just incorrect. LoRA is exactly like normal fine-tuning here in this particular context. The author's argument is that you should do LoRA because it doesn't do any "destructive overwriting", but in that aspect it's no different than normal fine-tuning.
In fact, there's evidence that LoRA can actually make the problem worse[1]:
> we first show that the weight matrices trained with LoRA have new, high-ranking singular vectors, which we call intruder dimensions [...] LoRA fine-tuned models with intruder dimensions are inferior to fully fine-tuned models outside the adaptation task’s distribution, despite matching accuracy in distribution.
[1] -- https://arxiv.org/pdf/2410.21228
To be fair, "if you don't know what you're doing then doing LoRA over normal finetuning" is, in general, a good advice in my opinion. But that's not what the article is saying.
> But on what basis do you say that "most people do"?
On the basis of seeing what the common practice is, at least in the open (in the local LLM community and in the research space).
> I would just say this: there are many good reasons to not merge
I never said that there aren't good reasons to not merge.
> It seems to me you are being some combination of: uncharitable, overlooking another valid way of reading the text, being too quick to judge.
No, I'm just tired of constantly seeing a torrent of misinformation from people who don't know much about how these models actually work nor have done any significant work on their internals, yet try to write about them with authority.