| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by viktour19 594 days ago
	> LoRA and full fine-tuning, with equal performance on the fine-tuning task, can have solutions with very different generalization behaviors outside the fine-tuning task distribution. The ability for nnets to generalize is inherently tied to their trainable parameter count via mechanisms we don't understand but we know parameter count is the key. When you finetune with lora, you're updating maybe 5% of the parameters, I really don't think there is an illusion of equivalence in the field.

3 comments

kelseyfrog 594 days ago

> When you finetune with lora, you're updating maybe 5% of the parameters

I'm not sure I understand this comment. The LoRA paper[1] specifically says that all of the pretrained weights remain frozen.

> keeping the pre-trained weights frozen

Specifically, the LoRA paper differentiates itself from updating some parameters by stating

> Many sought to mitigate this by adapting only some parameters or learning external modules for new tasks.

1. https://arxiv.org/pdf/2106.09685

link

viktour19 594 days ago

The effective parameters of the model are the parameters of the original model + lora parameters i.e lora updates only lora parameters, and full finetuning updates only original model parameters.

link

abhgh 594 days ago

More magnitude than count [1] I think, but I haven't kept up in a while.

[1] https://proceedings.neurips.cc/paper_files/paper/1996/file/f...

link

wrs 594 days ago

Well, I think it depends who you talk to. I suspect quite a few practitioners (as opposed to researchers) regard LoRA as a valid shortcut without full consideration of the difference.

link