|
|
|
|
|
by cracker_jacks
1564 days ago
|
|
> The more dimensions, the simpler the surface. With enough dimensionality, everything is a straight line. Even assuming this statement were true (which is a big if). That does not explain how it would outperform gradient descent. |
|
Grad desc, as an exhaustive search over all those dimensions "maybe" less performant in very high dimensions, where we have enough data to be "right some of the time" when randomly choosing which dim to discriminate on.
If we force zero loss, ie., an interpolation regime, then we're getting interpolation as usual. Can we get there faster when dimensionality increases?
It's plausible if count(relevant dimensions) << count(dimensions), and if they discriminating dimensions for any two random points is itself random.