Hacker News new | ask | show | jobs
by twstws 4611 days ago
Each variable is standardized to mean = 0, standard deviation =1. If you reject this as arbitrary, you are rejecting correlation analysis as a whole - this is exactly the same standardization done to two variables in bivariate correlation, extended to a multivariate data set.

PCA is a form of (or at least related to) correlation. With standardization the resulting transformation hihlights variables in the original data that are most highly correlated. Without standardization you're visualizing covariation. Unlike correlation, covariation is influenced by the magnitude of the variables.

By standardizing, you control for differences in the magnitude of the variables, and focus on their inherent variation instead.

1 comments

1) this doesn't work for cases where your data are positive-definite.

However, let's set that aside. I apologize for being a bit obfuscatory. My point is: If this is the case, then the explanation in the OP is totally misleading, because your data shouldn't look like an ellipsoid, but rather a circle. PCA should only be used in situations where there is a reason to believe there is a mechanistically justifiable "hidden value" that underlies otherwise uncontrolled "independent variables", thus making a dimensional reduction reasonable.

This is not at all the situation that the OP goes over in the first part of the post.

The example was a little clunky, but I don't find it misleading. A biplot of two normalized variables is elliptical, if the variables are correlated. This particular hand-drawn example does indeed look a bit weird, but that doesn't detract from the main point. It clearly shows the relationship between the original data and the ordination; it's a rigid rotation.

This is easily grasped with a 2d example, despite the fact that PCA makes no sense with only two variables.