Y
Hacker News
new
|
ask
|
show
|
jobs
by
aabaker99
1721 days ago
1. Gradient descent almost always finds a non optimum local min (it is not guaranteed to find a global min).
2 comments
agnosticmantis
1721 days ago
Isn’t the current best practice to train highly over-parametrized models to zero training error? That’d be a global optima, no?
Unless we’re talking about the optima of test error.
link
aabaker99
1721 days ago
If you find a zero in a non negative function, I would call that a global minima, yes.
link
aledalgrande
1721 days ago
Yeah but depending on the data you might have even worse results, selecting the right subset to be representative is really important.
link
jhgb
1721 days ago
Would a random sample be representative? Statistically this seems to be the case for any large N. In fact it's not clear to me that any other sample would be more representative.
link
aledalgrande
1720 days ago
Many public datasets have skewed classes so if you take a random approach you're not gonna have a good result. And N might not be big enough anyway.
link
Unless we’re talking about the optima of test error.