| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by gliptic 1181 days ago
	Correct me if I'm wrong, but I think you still need to compute gradients of non-trained weights in order to compute the gradients of the LoRA weights. What you don't have to do is store and update the optimizer state for all those non-trained weights.

1 comments

stu2b50 1181 days ago

I mean the derivative of a constant is 0. So if all of the original weights are considered constants, then computing their gradients is trivial, since they’re just zero.

link

jprafael 1181 days ago

Computing gradients is easy/cheap. What this technique solves is that you no longer need to store the computed values of the gradient until the backpropagation phase, which saves on expensive GPU RAM, allowing you to use commodity hardware.

link