Y
Hacker News
new
|
ask
|
show
|
jobs
by
FeepingCreature
815 days ago
LoRA training/merging basically is "crank up the batch size ridiculously high" in a nutshell, right? What actually breaks when you do that?
1 comments
brrrrrm
815 days ago
Cranking up the batch size kills convergence.
link
FeepingCreature
815 days ago
Wonder if that can be avoided by modifying the training approach. Ideas offhand: group by topic, train a subset of weights per node; figure out which layers have the most divergence and reduce lr on those only.
link
brrrrrm
813 days ago
A provable way to recover convergence is to calculate the hessian. It’s computationally expensive but there are approximation methods.
link