Hacker News new | ask | show | jobs
by fluidcruft 4153 days ago
> In other words, matrix factorization is the real deal. It’s the stuff that separates true data scientists from charlatans — the data alchemists, data phrenologists, and data astrologers of the world

I certainly hope this is sarcasm. Matrix factorization is, like, the go-to tool for people that don't know anything about what they are studying (i.e. it's the phrenologist's favorite weapon). Factor a matrix, throw the results out there, slap on some perfunctory "discussion" that has no real mechanistic insight. Boom. Published.

But maybe I'm describing "the stuff that separates true scientists from data scientists".

Data science manifesto: The purpose of computing is numbers.

4 comments

I have a running joke with my machine learning friends that I will write a Data Science/ML book titled "A Thousand Ways to Say 'Singular Value Decomposition'". The number of papers and techniques out there that are SVD with a few minor tweaks and a unique philosophical interpretation of SVD is hilarious.

Here are some examples:

Principal Component Analysis - SVD does dimensionality reduction where some n% of variance should be accounted for.

One layer Autoencoder - SVD done by a neural network

Latent Semantic Analysis - SVD on td-idf matrix we interrupt lower dimensions as having semantic importance

Matrix Factorization - SVD only now we interrupt lower dimensions as representing latent variables

Collaborative Filtering - SVD where we interrupt lower dimensions as representing latent variables AND we use a a distance measure to determine similarity.

> One layer Autoencoder - SVD done by a neural network

Not necessarily. Any serious user of autoencoders would apply some kind of L1 regularization or other sparsity constraint to the coefficients learned, so that the autoencoder does not learn the principal components of the data but instead learns an analogous sparse decomposition of the data (with the assumption that sparse representations have better generalization power).

Also I don't think any of the techniques you mentioned is being passed as "not SVD" by its practitioners. People know they're SVD. These names are just used as labels for use cases of SVD, each with their specific (and crucial) bells and whistles. And yes, these labels are useful.

Cognition is fundamentally dimensionality reduction over a space of information, so clearly most ML algorithms are going to be isomorphic to SVD in some way. More interesting to me are the really non-obvious ways in which that is happening (eg. RNNs learning word embeddings with skip-gram are actually factorizing a matrix of pairwise mutual information of words over a local context...)

That doesn't make these algorithms any less valuable.

I'd also add here that you can add other variables in to the mix such as gaussian noise and drop out which is the basis for a lot of fundamental neural networks. I get the intent, but it's not necessarily the case.

Neural word embeddings are one of the most fun things I work with. Both word2vec as well as glove and paragraph vectors.

There's also the ability to learn varying length windows of phrases via recursive or convolutional methods.

Nota bene, for anyone having trouble parsing Homunculiheaded's description of each algorithm: s/interrupt/interpret
NMF != SVD.
I think an important takeaway here is that when you know very little about the nature of your data, dimensionality reduction is a great place to start. You're right to criticize matrix factorization as a end-all be-all tool for "machine learners", but don't mistake Jeremy's point here -- matrix factorization, and dimensionality reduction techniques in general, is a great first step in understanding any dataset.
Given the overall tone of the piece, I feel sure that it is sarcasm. I laughed hardall the way through the piece, starting with: But before I get into the details, I want to motivate the algorithm by pointing out an application to the most heavily studied problem in computer science: how to get people to buy more things.
SVDs are as basic as hash-tables are to programming.

That would probably be the second thing people learn right after learning how to do some basic regression analysis.