Y
Hacker News
new
|
ask
|
show
|
jobs
by
ImprobableTruth
811 days ago
Transmission speeds aren't fast enough for this, unless you crank up the batch size ridiculously high.
1 comments
FeepingCreature
810 days ago
LoRA training/merging basically is "crank up the batch size ridiculously high" in a nutshell, right? What actually breaks when you do that?
link
brrrrrm
810 days ago
Cranking up the batch size kills convergence.
link
FeepingCreature
810 days ago
Wonder if that can be avoided by modifying the training approach. Ideas offhand: group by topic, train a subset of weights per node; figure out which layers have the most divergence and reduce lr on those only.
link
brrrrrm
808 days ago
A provable way to recover convergence is to calculate the hessian. It’s computationally expensive but there are approximation methods.
link