| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by edwardjhu 1183 days ago
	Adapters are extra layers inserted between existing layers, so they can't be parallelized. LoRA reparametrizes the weight updates and is easily parallelized or merged with the original weights during inference. Also, if you let the rank r be the hidden size you roughly recover finetuning, so you can see LoRA as a generalization of the latter. Add a task specific layer and only training that layer doesn't work well. In practice, people combine many of these things, e.g., LoRA + task-specific final layer.

1 comments

leobg 1183 days ago

Thanks for the clarification. Does that mean then that when parallelization is not important, training an adapter might be just as good as or better than LoRA?

link

edwardjhu 1182 days ago

If latency is irrelevant, I don't think there is a strong practical reason to prefer one over another. (LoRA is more elegant in my biased opinion because you roughly recover finetuning with a large r.) In practice, you see one do a little better on some tasks and vice versa on others as observed by papers after mine.

link