|
|
|
|
|
by srean
1514 days ago
|
|
I dont disagree, except perhaps the lolcatz's demand for rigour. Improve with small and simple steps till you cant is not a bad idea after all. BTW your randomized algorithm with a minor tweak is surprisingly (unbelievably) effective -- randomize the weights of the hidden layers, do a gradient descent on just the final layer. Note the loss is even convex in the last layer weights if matching/canonical activation function is used. In fact you dont even have to try different random choices, but of course that would help. The random kitchen sink line of results are a more recent heir to this line of work. I suspect that you already know this and the fact that the noise in SGD does indeed regularize and the way it does so for convex function has been well understood since the 70s, so I am leaving this tidbit for others who are new to this area. |
|