|
|
|
|
|
by rcar
1867 days ago
|
|
PCA is a cool technique mathematically, but in my many years of building models, I've never seen it result in a more accurate model. I could see it potentially being useful in situations where you're forced to use a linear/logistic model since you're going to have to do a lot of feature preprocessing, but tree ensembles, NNs, etc. are all able to tease out pretty complicated relationships among features on their own. Considering that PCA also complicates things from a model interpretability point of view, it feels to me like a method whose time has largely passed. |
|
This is a strange comment since my primary usages of PCA/SVD is as a first step in understanding latent factors which are driving the data. Latent factors typically involve all of the important things that anyone running a business or deciding policy care about: customer engagement, patient well being, employee hapiness, etc all represent latent factors.
If you have ever wanted to perform data analysis and gain some exciting insight into explaining user behavior, PCA/SVD will get you there pretty quickly. It is one of the most powerful tools in my arsenal when I'm working on a project that requires interoperability.
The "loadings" in PC and the V matrix in SVD both contain information about how the original feature space correlates with the new projection. This can easily show thing things like "User's who do X,Y and NOT Z are more likely to purchase".
Likewise in LSA (Latent Semantic Analysis/indexing) on a Term-Frequency matrix you will get a first pass at semantic embedding. You'll notice, for example, that "dog" and "cat" will project onto the new space in a common PC which can be used to interpret "pets".
> I've never seen it result in a more accurate model. I could see it potentially being useful in situations where you're forced to use a linear/logistic model
PCA/SVD are a linear transformation of the data and shouldn't give you any performance increase on a linear model. However they can be very helpful in transforming extremely high dimensional, sparse vectors into lower dimensional, dense representations. This can provide a lot of storage/performance benefits.
> NNs, etc. are all able to tease out pretty complicated relationships among features on their own.
PCA is literally identical to an autoencoder minimizing the MSE with no non-linear layers. It is a very good first step towards understanding what your NN will eventually do. After all, all NNs perform a non-linear matrix transformation so that your final vector space is ultimately linearly separable.