Hacker News new | ask | show | jobs
by kevin948 1760 days ago
This is spot on with my own observations, especially as we get into modelling more 'abstract' ideas.

As more NN methods become viable, some more savvy data scientists complain to me "this NN is just approximating SVD/PCA/POD/etc!" Wonderful, that's explicitly the point! The network we're applying to this problem compares/combines multiple approaches to dimension reduction. The network created a latent space that makes way more semantic sense than just PCA or SVD for this problem (No Free Lunch). It still takes effort and understanding, but the value I've personally gotten over just applying PCA for my problem-sets has been incredible. In fact I'm certain it has made my career. Turns out diagonalizing covariance matrices aren't the only dimension reduction game in town!

2 comments

It seems so strange to me as a software/data engineer to read things like this from a fairly adjacent space and understand none of the words.
As someone who's spent 20 years tuning my own genetic algorithms, being swamped by newbs who spout fancy language and don't even want to know how to write the code themselves just feels like what it is - a new generation of recent business grads who swapped "blockchain" for "ML". Soon to be separated into "founders" and real estate agents, while the rest of us toil in the vineyard. So goes it.
Neural networks let anyone bullshit a good game until things get tricky. Back in the day frontpage was going to kill web development because anyone could make a website. Now we can slap newer tech we don’t understand together and profit will ensue.

It’ll democratise but it’s not there yet.

I've been doing it for two years and am barely past the "understand none of the words" phase.

It helps to think of each term as an interesting puzzle. For example, SVD. It's fascinating if you dig into it. Most people don't want to, because it feels like work. For me, it's neat understanding ... whatever it is, ha.

I think it's finding the basis eigenvectors in a higher dimensional space, which basically just means that e.g. the eigenvectors of a cube are the X, Y, Z axes you're used to. If you skew it along the X axis, the Y axis bends a bit, along with the cube.

The eigenvectors form a shape that, when you find the volume of it, is the area of the resulting form. So the determinant of a cube's SVD is the volume of the cube.

In higher dimensional spaces, it's the same thing, except it's called "eigenvectors" (named after Sir Eigen of Eigenmadethisup) because mathematicians have reasons for using complicated language, some of which is valid. But as you see from me muddling through this, the underlying concepts are all small simple pieces that fit together.

Or I was nowhere close to the explanation of SVD. But it was close to something interesting, since it leads to the question of "What's the SVD of a sphere? How about a point cloud?" It was easy to figure out for a cube. Not so easy when it's an arbitrary shape. "And why is it useful?" Because it gives a lot of hints about what that object is. In the optimal case, in StyleGAN for example, the SVD can even be the basis vectors like "smile", "age", and so on. (You know in Faceapp how you can drag the "Age" slider and make yourself look older or younger? That's a basis vector in higher dimensional space. It's orthogonal -- more or less -- to "smile", because if you drag the "smile" basis vector around, it doesn't cause you to age older or younger. Except it's not quite orthogonal, because it's a higher dimensional weird-ass shape and therefore can't be orthogonal, so sometimes when you make someone older their hair turns grey even though "blonde hair" is orthogonal to "age" in theory.)

Yada, yada. Rinse and repeat and dive in for a couple years. You'll find it's fun once you jump in.

P.S. All the people reading this that feel offended like "No, you really must start with theory; you can't possibly learn anything if you don't know what you're doing," you better read this: http://thecodist.com/article/the_programming_steamroller_wai...

That steamroller is coming for you. Once the legions of javascript programmers realize that hey, I can do DL just like an ML researcher, you're gonna be doomed. Because a 17yo JS programmer has roughly 10x as much determination as even I can muster these days, let alone someone who clings to the idea that theory is the only path forward.

> savvy data scientists complain to me "this NN is just approximating SVD/PCA ..."

It wouldn't be "approximating" anything. An optimal one layer linear NN "autoencoder" is PCA. There are other learning algorithms for PCA than gradient descent, but the infrastructure for learning NN:s with big data sets makes it painless.

As soon as you add activations and layers, you're improving on SVD/PCA. For dimensionality reduction, it means the "manifold" is more complicated than just a linear projection.

> As soon as you add activations and layers, you're improving on SVD/PCA

You're expanding the space of realizable functions, which is an improvement in a specific sense, but not in all senses! The SVD, since it is better understood theorist theoretically, is a more straightforward problem to solve robustly. There are fewer hyperparameters (like learning rate) to choose, and you aren't left wondering whether your solution is at a bad local minimum.

I think it's wrong to think that it's an obvious improvement.