| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by ks1723 847 days ago
	Just to add to this, only the two learning rates are changed, everything else including initialization and data is fixed. From the paper: Training consists of 500 (sometimes 1000) iterations of full batch steepest gradient descent. Training is performed for a 2d grid of η0 and η1 hyperparameter values, with all other hyperparameters held fixed (including network initialization and training data).