|
|
|
|
|
by grph123dot
1187 days ago
|
|
>> that the change in the weights as you train also have low intrinsic rank It seems that the initial matrix of weights has a low rank approximation A and this implies that the difference E = W - A is small, also it seems that PCA fails when E is sparse because PCA is designed to be optimum when the error is gaussian. |
|
Since the weights are derived from gradient descent, yeah we don't really know what the distributions would be.
A random projection empirically works quite well for very high dimensions, and is of course very cheap computationally.