| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by brrrrrm 810 days ago
	Cranking up the batch size kills convergence.

1 comments

FeepingCreature 810 days ago

Wonder if that can be avoided by modifying the training approach. Ideas offhand: group by topic, train a subset of weights per node; figure out which layers have the most divergence and reduce lr on those only.

link

brrrrrm 808 days ago

A provable way to recover convergence is to calculate the hessian. It’s computationally expensive but there are approximation methods.

link