Related, I used to jokingly explain mean-square-error approximation like this: there is this geometrical theorem, that you can drive a line through any three points on a plane, as long as the line is thick enough. So mean-square-error approximation is basically minimizing the thickness of that line :).