| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by Homunculiheaded 3693 days ago

A single layer autoencoder with n nodes is equivalent to doing PCA and taking the first n principal components. If you're familiar with PCA in natural language processing, which is called Latent Semantic Analysis (or Indexing), projecting high dimensional data on a lower dimensional surface can actually improve your features. This is because similar words will project onto the same Principal component allowing you to model some semantic information.

Autoencoders with more than 1 layer are more interesting because you end up doing what is essentially non-linear PCA by projecting your data onto a curved manifold. This famous paper, "Reducing the Dimensionality of Data with Neural Networks" [0], by Hinton shows the improvement in how linearly separable documents become once multi-layer autoencoders are used.

The old argument was that unsupervised pretraining helps get proper weights faster, but this has largely been disproven. However, I do believe AEs assist in semi-supervised learning because they project the initial data into a more useful space. As you can seen in the article I linked the projected data are much more linearly separable.

And as a practical evidence: I used a 5 layer AE in the kaggle black box competition [1] to eventually outrank of team of Hinton's grad students. The problem had a larger unsupervised data set with a small number of labels. Using the autoencoders before the MLP ended up nearly doubling our team's score.

[0] https://www.cs.toronto.edu/~hinton/science.pdf [1] https://www.kaggle.com/c/challenges-in-representation-learni...

2 comments

nomailing 3693 days ago

Thank you for the answer. That makes very much sense.

Just a side note: As far as I know a single layer autoencoder and PCA are only equivalent if all units have no activation function (linear activation function), which is usually not the case.

link

_0ffh 3693 days ago

"The old argument was that unsupervised pretraining helps get proper weights faster, but this has largely been disproven."

Do you hold that to be true in general, or only when using dropout?

link