|
|
|
|
|
by kouteiheika
261 days ago
|
|
> However, the literature is unclear on how well LoRA performs relative to FullFT. I think the literature is clear on that? "LoRA vs Full Fine-tuning: An Illusion of Equivalence" -- https://arxiv.org/abs/2410.21228v1 Quoting from the conclusions: > The paper describes the finding that LoRA and full fine-tuning, with equal performance on the fine-tuning task, can have solutions with very different generalization behaviors outside the fine-tuning task distribution. We found that LoRA and full fine-tuning yield models with significant differences spectral properties of their weight matrices: LoRA models often containing “intruder dimensions”, high-ranking singular vectors approximately orthogonal to the singular vectors of pre-trained weight matrices. The existence of intruder dimensions correlates with the fine-tuned model forgetting more of the pre-training distribution as well as forgetting more when trained on tasks sequentially in a continual learning setup. I'm surprised they didn't cite this; it's a well known paper. |
|