| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by cfgauss2718 825 days ago
	Yes of course any positive definite matrix can be used as a metric on the corresponding Euclidean space - but that doesn’t mean it’s necessarily useful as a metric. Hence I think it’s useful to distinguish things which could be a metric (in that a metric can be constructed from them), versus things which when applied as a metric actually provide some benefit. In particular, if we believe the manifold hypothesis, then one should expect a useful metric on features to be local and not static - the quantity W’W clearly does not depend on the inputs to the layer at inference time, and so is static.

1 comments

tech_ken 825 days ago

How are you defining the utility of a metric here? It’s not clear to me why a locally-varying metric would be necessarily more 'useful' than a global one in the context of the manifold hypothesis.

Moreover, if I’m understanding their argument right then W’W is proportional to an average of the exterior derivative of the manifold representing prediction surface of any given NN layer (averaging with respect to the measure defined by the data generating process). While this averaging by definition leaves some of the local information on the cutting room floor, the result is going to be far more interpretable (because we've discarded all that distracting local data) and I would assume will still retain the large-scale structure of the underlying manifold (outside of some gross edge-cases).

link

cfgauss2718 824 days ago

If one thinks of a metric as a “distance measure”, which is to say, how similar is some input x to the “feature” encoded by a layer f(x), and if this feature corresponds to some submanifold of the data, then naturally this manifold will have curvature and the distance measure will do better to account for this curvature. Then generally the metric (in this case, defining a connection on the data manifold) should encode this curvature and therefore is a local quantity. If one chooses a fixed metric, then implicitly the data manifold is being treated as a flat space - like a vector space - which generally it is not. My favorite example for this is the earth, a 2-sphere that is embedded in a higher dimensional space. The correct similarity measure between points is the length of a geodesic connecting those points. If instead one were to just take a flat map (choice of coordinates) and compare their Euclidean distance, it would only be a decent approximation of similarity if the points are already very close. This is like the flat earth fallacy.

link

tech_ken 824 days ago

But this argument seems analogous to someone saying that average height is less “correct” than the full original data set because every individual’s height is different. In one sense it’s not wrong, but it kind of misses the point of averaging. The full local metric tensor defined on the manifold is going to have the same “complexity” as the manifold itself; it’s a bad way of summarizing a model because it’s not any simpler than the model. Their approach is to average that metric tensor over the region of the manifold swept out by the training data, and they show that this average empirically reflects something meaningful about the underlying response manifold in problems that we’re interested in. Whether or not this average quantity can entirely reproduce that original manifold is kind of irrelevant (and indeed undesirable), the point is that it (a) represents something meaningful about the model and (b) it’s low dimensional enough for a human to reason about. Although globally it will not be accurate to distances along the surface, presumably it is “good enough” to at least first order for much of the support of the training data.

link