Hacker News new | ask | show | jobs
by cracker_jacks 1564 days ago
> The more dimensions, the simpler the surface. With enough dimensionality, everything is a straight line.

Even assuming this statement were true (which is a big if). That does not explain how it would outperform gradient descent.

1 comments

As dimensionality grows, data is more linearly sperable, so fewer dimensions are signficant in distinguishing the data (though for any pair of points, those dimensions will be different).

Grad desc, as an exhaustive search over all those dimensions "maybe" less performant in very high dimensions, where we have enough data to be "right some of the time" when randomly choosing which dim to discriminate on.

If we force zero loss, ie., an interpolation regime, then we're getting interpolation as usual. Can we get there faster when dimensionality increases?

It's plausible if count(relevant dimensions) << count(dimensions), and if they discriminating dimensions for any two random points is itself random.

I believe what you are describing is grid search, not gradient descent. Gradient descent is not an exhaustive search.
It is if you see vec_a . vec_b, as a traversal over the dimensions of both vectors.