Hacker News new | ask | show | jobs
by yobbo 1762 days ago
> savvy data scientists complain to me "this NN is just approximating SVD/PCA ..."

It wouldn't be "approximating" anything. An optimal one layer linear NN "autoencoder" is PCA. There are other learning algorithms for PCA than gradient descent, but the infrastructure for learning NN:s with big data sets makes it painless.

As soon as you add activations and layers, you're improving on SVD/PCA. For dimensionality reduction, it means the "manifold" is more complicated than just a linear projection.

1 comments

> As soon as you add activations and layers, you're improving on SVD/PCA

You're expanding the space of realizable functions, which is an improvement in a specific sense, but not in all senses! The SVD, since it is better understood theorist theoretically, is a more straightforward problem to solve robustly. There are fewer hyperparameters (like learning rate) to choose, and you aren't left wondering whether your solution is at a bad local minimum.

I think it's wrong to think that it's an obvious improvement.