| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by dpcx 3190 days ago

This seems like a great introduction to the history. I have a problem with it, though.

In the first example, the method compute_error_for_line_given_points is called with values 1, 2, [[3,6],[6,9],[12,18]]. Where did those values come from?

Later in that same example, there is an "Error = 4^2 + (-1)^2 + 6^2". Where did those values come from?

Later, there's another form: "Error = x^5 - 2x^3 -2" What about these?

There seem to be magic formulae everywhere, with no real explanation in the article about where they came from. Without that, I have no way of actually understanding this.

Am I missing something fundamental here?

4 comments

twillmas 3190 days ago

I'd also like to see more of a "teaching" post that can walk through the math incrementally.

Many of the deep learning courses assume "high school math", but my school must have skipped matrices, so I've been watching Khan Academy videos.

Are there any good posts / books on walking through the math of deep learning from a true beginner's perspective?

link

yorwba 3190 days ago

The other replies are already telling you that these are just examples. I want to stress that these are completely unrelated examples, which is bad form IMO.

If the first example had been kept, then the second would have been "Error = (6 - (2·3 + 1))² + (9 - (2·6 + 1))² + (18 - (2·12 + 1))² = (-1)² + (-4)² + (-7)² = 66", which is what compute_error_for_line_given_points evaluates to.

The third would have been "Error = (6 - (m·3 + b))² + (9 - (m·6 + b))² + (18 - (m·12 + b))² = 3·b² + 42·b·m - 66·b + 189·m² - 576·m + 441" and its derivative would have to be taken in two directions, giving "dError/dm = 42·b + 378·m - 576" and "dError/db = 6·b + 42·m - 66". Visualizing that slope would require a 3D plot.

link

letlambda 3190 days ago

>Am I missing something fundamental here? Yeah, these aren't magic formula, they are just examples.

>In the first example, the method compute_error_for_line_given_points is called with values 1, 2, [[3,6],[6,9],[12,18]]. Where did those values come from?

It's an example. The first two arguments define a line y = 2x + 1, the pairs are (x,y) points being used to compute the error.

"To play with this, let’s assume that the error function is Error=x^5−2x^3−2"

This is just an example of a function used as exposition to talk about derivatives.

It isn't even an error function though. An error function has to be a function of at least two variables.

link

emilwallner 3190 days ago

Good point. They are all example data. The "[[3,6],[6,9],[12,18]]" can be thought of as the coordinates of a comet, and 2 is your predicted correlation, the slope, followed by 1 your predicted constant, the y-intercept. In this case, you want to change 2 and 1 to find the combination that results in the lowest error. It the same with "Error = 4^2 + (-1)^2 + 6^2", it's an example of an error function. Does that make sense?

link