| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by aabaker99 1721 days ago
	1. Gradient descent almost always finds a non optimum local min (it is not guaranteed to find a global min).

2 comments

agnosticmantis 1721 days ago

Isn’t the current best practice to train highly over-parametrized models to zero training error? That’d be a global optima, no?

Unless we’re talking about the optima of test error.

link

aabaker99 1721 days ago

If you find a zero in a non negative function, I would call that a global minima, yes.

link

aledalgrande 1721 days ago

Yeah but depending on the data you might have even worse results, selecting the right subset to be representative is really important.

link

jhgb 1721 days ago

Would a random sample be representative? Statistically this seems to be the case for any large N. In fact it's not clear to me that any other sample would be more representative.

link

aledalgrande 1720 days ago

Many public datasets have skewed classes so if you take a random approach you're not gonna have a good result. And N might not be big enough anyway.

link