|
|
|
|
|
by viktour19
594 days ago
|
|
> LoRA and full fine-tuning, with equal performance on the fine-tuning task, can have solutions with very different generalization behaviors outside the fine-tuning task distribution. The ability for nnets to generalize is inherently tied to their trainable parameter count via mechanisms we don't understand but we know parameter count is the key. When you finetune with lora, you're updating maybe 5% of the parameters, I really don't think there is an illusion of equivalence in the field. |
|
I'm not sure I understand this comment. The LoRA paper[1] specifically says that all of the pretrained weights remain frozen.
> keeping the pre-trained weights frozen
Specifically, the LoRA paper differentiates itself from updating some parameters by stating
> Many sought to mitigate this by adapting only some parameters or learning external modules for new tasks.
1. https://arxiv.org/pdf/2106.09685