| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by bosco_mcnasty 847 days ago
	your point is valid but the paper explains it clearly and obviously. they are NOT dimensionally reduced hyperparameters, no. The hyperparameters are learning rates, that's it. X axis, learning rate for input (1 hidden layer). Y axis, learning rate for output layer. So what this is saying, for certain ill-chosen learning weights, model convergence is for lack of a better word, chaotic and unstable.

1 comments

ks1723 847 days ago

Just to add to this, only the two learning rates are changed, everything else including initialization and data is fixed. From the paper:

Training consists of 500 (sometimes 1000) iterations of full batch steepest gradient descent. Training is performed for a 2d grid of η0 and η1 hyperparameter values, with all other hyperparameters held fixed (including network initialization and training data).

link