Hacker News new | ask | show | jobs
by aabaker99 1721 days ago
1. Gradient descent almost always finds a non optimum local min (it is not guaranteed to find a global min).
2 comments

Isn’t the current best practice to train highly over-parametrized models to zero training error? That’d be a global optima, no?

Unless we’re talking about the optima of test error.

If you find a zero in a non negative function, I would call that a global minima, yes.
Yeah but depending on the data you might have even worse results, selecting the right subset to be representative is really important.
Would a random sample be representative? Statistically this seems to be the case for any large N. In fact it's not clear to me that any other sample would be more representative.
Many public datasets have skewed classes so if you take a random approach you're not gonna have a good result. And N might not be big enough anyway.