| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by sanxiyn 591 days ago

Geoffrey Hinton (now a Nobel Prize winner!) himself did a summary. I think it is the single best summary on this topic.

  Our labeled datasets were thousands of times too small.
  Our computers were millions of times too slow.
  We initialized the weights in a stupid way.
  We used the wrong type of non-linearity.

3 comments

helltone 589 days ago

I'm curious and it's not obvious to me: what changed in terms of weight initialisation?

link

imjonse 591 days ago

That is a pithier formulation of the widely accepted summary of "more data + more compute + algo improvements"

link

sanxiyn 591 days ago

No, it isn't. It emphasizes importance of Glorot initialization and ReLU.

link

HarHarVeryFunny 584 days ago

Also:

nets too small (not enough layers)

gradients not flowing (residual connections)

layer outputs not normalized

training algorithms and procedures not optimal (Adam, warm-up, etc)

link