|
|
|
|
|
by tech_ken
817 days ago
|
|
It has more than just the flavor of a metric, it’s exactly a metric because any “square” matrix (M’M) is positive definite (since x’M’Mx=y’y=<y,y>, where y=Mx). It could be interpreted as a metric which only cares about distances along dimensions that the NN cares about. I agree that the proportionality seems a little obvious, I think what’s most interesting is what they do with the quantity, but I’m surprised no one else has tried this. |
|
In particular, if we believe the manifold hypothesis, then one should expect a useful metric on features to be local and not static - the quantity W’W clearly does not depend on the inputs to the layer at inference time, and so is static.