Hacker News new | ask | show | jobs
by chestervonwinch 4155 days ago
It's not completely by chance. There's an old paper [1] that shows that if the activation functions are well approximated using only up to the linear term of its Taylor expansion, then the optimal weights for encoding and decoding are the same as PCA.

There's probably newer results on this topic; I'm sure.

However, I will say that I've created some autoencoders on toy sets like those found in scikit-learn, and the spaces learned via the autoencoder and the spaces found through PCA were often similar if not identical. For example, if my input vectors were in R^n (with n > 3) and I restricted an autoencoder to 3 units, the encoding matrix of the autoencoder would span the same subspace as the first 3 principal component directions.

[1]: http://oucsace.cs.ohiou.edu/~razvan/courses/dl6900/papers/bo...