Hacker News new | ask | show | jobs
by cfgauss2718 815 days ago
If one thinks of a metric as a “distance measure”, which is to say, how similar is some input x to the “feature” encoded by a layer f(x), and if this feature corresponds to some submanifold of the data, then naturally this manifold will have curvature and the distance measure will do better to account for this curvature. Then generally the metric (in this case, defining a connection on the data manifold) should encode this curvature and therefore is a local quantity. If one chooses a fixed metric, then implicitly the data manifold is being treated as a flat space - like a vector space - which generally it is not. My favorite example for this is the earth, a 2-sphere that is embedded in a higher dimensional space. The correct similarity measure between points is the length of a geodesic connecting those points. If instead one were to just take a flat map (choice of coordinates) and compare their Euclidean distance, it would only be a decent approximation of similarity if the points are already very close. This is like the flat earth fallacy.
1 comments

But this argument seems analogous to someone saying that average height is less “correct” than the full original data set because every individual’s height is different. In one sense it’s not wrong, but it kind of misses the point of averaging. The full local metric tensor defined on the manifold is going to have the same “complexity” as the manifold itself; it’s a bad way of summarizing a model because it’s not any simpler than the model. Their approach is to average that metric tensor over the region of the manifold swept out by the training data, and they show that this average empirically reflects something meaningful about the underlying response manifold in problems that we’re interested in. Whether or not this average quantity can entirely reproduce that original manifold is kind of irrelevant (and indeed undesirable), the point is that it (a) represents something meaningful about the model and (b) it’s low dimensional enough for a human to reason about. Although globally it will not be accurate to distances along the surface, presumably it is “good enough” to at least first order for much of the support of the training data.