Hacker News new | ask | show | jobs
by okigan 3725 days ago
In this case you see that it is the swiss roll so you could say pick "proper initialization".

But that technique would not work when you cannot see that it is a "swiss roll" or in multiple dimensions.

1 comments

I'm pretty sure he wasn't talking about the swiss roll specifically. Big gains in neural net performance have been made through better initialization schemes (not dataset specific, just in general, e.g. an initialization scheme might adapt the initial weight distribution depending on the number of hidden units in the next layer), and smaller models are in general more sensitive to initialization.