|
I don't think "arbitrary linear algebra operations" is a valid critique. If you understand PCA as "take the SVD of the data", then the operations seem arbitrary. But if you understand it as, "construct a low-rank approximation in the L2 sense to the data, or its covariance", then it's not. Also, I don't think that the (very legitimate) "dimensional" critique of PCA applies here. The units on the coordinates of the representation are the same: the presence or absence of that prime factor. To the original question: my suspicion is that PCA might pull out the even numbers (first PC) and the divisible by 3 numbers (second PC), because these two factors may explain the most variability in the underlying vector representation. If it did, that would be pretty intuitive, although not as interesting. --- Edited to add: Suspicion turned out to be true. For the first 2000 integers, the top 6 PCs turned out to correspond to the first 6 primes (2, 3, 5, 7, 11, 13). Plot at: https://imgur.com/a/qi2Sx5u? function [nums,pcs]=pca_prime(nMax,nPC)
nums = zeros(nMax, nMax);
for k = 2:nMax,
nums(k,factor(k)) = 1; % vector representation of "k"
end;
% 2:end because don't care about 1 as a "prime"
pcs = pca(nums(2:end,:), 'NumComponents', nPC);
-- [nums,pcs]=pca_prime(2000,10); % "svd" would work too
plot(pcs(:,1:6)); % first 6 PCs
|