| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by ImprobableTruth 811 days ago
	Transmission speeds aren't fast enough for this, unless you crank up the batch size ridiculously high.

1 comments

FeepingCreature 810 days ago

LoRA training/merging basically is "crank up the batch size ridiculously high" in a nutshell, right? What actually breaks when you do that?

link

brrrrrm 810 days ago

Cranking up the batch size kills convergence.

link

FeepingCreature 810 days ago

Wonder if that can be avoided by modifying the training approach. Ideas offhand: group by topic, train a subset of weights per node; figure out which layers have the most divergence and reduce lr on those only.

link

brrrrrm 808 days ago

A provable way to recover convergence is to calculate the hessian. It’s computationally expensive but there are approximation methods.

link