Hacker News new | ask | show | jobs
by geye1234 408 days ago
Mathematical ignoramus writing here, but I have a long-term project to correct my ignorance of statistics so this seems a good place to start.

He isn't talking about how to calculate the linear regression, correct? He's talking about why using squared distances between data points and our line is a preferred technique over using absolute distances. Also, he doesn't explain why absolute distances produce multiple results I think? These aren't criticisms, I am just trying to make sure I understand.

ISTM that you have no idea how good your regression formula (y = ax + c) is without further info. You may have random data all over the place, and yet you will still come out with one linear regression to rule them all. His house price example is a good example of this: square footage is, obviously, only one of many factors that influence price -- and also the most easily quantified factor by far. Wouldn't a standard deviation be essential info to include?

Also, couldn't the fact that squared distance gives us only one result actually be a negative, since it can so easily oversimplify and therefore cut out a whole chunk of meaningful information?