|
|
|
|
|
by jampekka
408 days ago
|
|
The main practical reason why square error is minimized in ordinary linear regression is that it has an analytical solution. Makes it a bit weird example for gradient descent. There are plenty of error formulations that give a smooth loss function, and many even a convex one, but most don't have analytical solutions so they are solved via numerical optimization like GD. The main message is IMHO correct though: square error (and its implicit gaussian noise assumption) is all too often used just per convenience and tradition. |
|