Hacker News new | ask | show | jobs
by sushirain 3973 days ago
They are effective because:

- They use more parameters (and fewer computations per parameter.)

- They are hierarchical (convolutions are apparently useful at different levels of abstraction of data).

- They are distributed (word2vec, thought-vectors). Not restricted to a small set of artificial classes such as parts-of-speech or parts of visual objects.

- They are recurrent (RNN).

etc.

1 comments

word2vec isn't "deep" in the relevant sense. The both skipgram and CBOW forms have a single hidden layer.