| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by hodgehog11 236 days ago
	It's a hyperparameter much like learning rate. If the learning rate is too high, the training process would not work either. Addressing this is just a matter of a grid search.