|
|
|
|
|
by cfgauss2718
825 days ago
|
|
Yes of course any positive definite matrix can be used as a metric on the corresponding Euclidean space - but that doesn’t mean it’s necessarily useful as a metric. Hence I think it’s useful to distinguish things which could be a metric (in that a metric can be constructed from them), versus things which when applied as a metric actually provide some benefit. In particular, if we believe the manifold hypothesis, then one should expect a useful metric on features to be local and not static - the quantity W’W clearly does not depend on the inputs to the layer at inference time, and so is static. |
|
Moreover, if I’m understanding their argument right then W’W is proportional to an average of the exterior derivative of the manifold representing prediction surface of any given NN layer (averaging with respect to the measure defined by the data generating process). While this averaging by definition leaves some of the local information on the cutting room floor, the result is going to be far more interpretable (because we've discarded all that distracting local data) and I would assume will still retain the large-scale structure of the underlying manifold (outside of some gross edge-cases).