| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by cjhanks 2236 days ago

If you have ever opened up Excel or a similar program. One of the more useful options is to generate a regression line-fit on your data points.

One option is to specify a polynomial function, you can specify how many coefficients you want. One of the measurements is the mean-squared-error between the line-fit and the points.

You can add as many polynomial coefficients as you want, and you will be able to decrease the mean squared error. But the more polynomial's you choose, two things will be true:

1. The line-fit will be far more likely to go through the points.

2. At points in the line where there was no data, the line will less approximate the underlying physical reality.

That same mathematical property is what is relevant here. There is nothing inherently evil about non-linearity, when the non-linear math model properly maps to the physical reality. But when you over fit a line, many of the functional solutions may be completely wrong.

1 comments

EE84M3i 2236 days ago

I'm confused. I agree that overfitting can lead to very bad models.

But, what I don't understand is that I thought that "linear" in ML contexts was normally used in the sense of 'linear transformations', which is a sense of linear that 'line-fit' from excel isn't -- it's affine.

Is a linear model with thousands/millions of weights/parameters (like deep learning models) really substantially simpler to understand? Can it do anything useful?

[1]: https://en.wikipedia.org/wiki/Linear_map

link

cjhanks 2236 days ago

I suppose from the perspective of someone implementing these models, yeah - it is linear, but it is not bijective. In a system with only one layer, that manifests as an alias (assuming the output dimensions are smaller). In a system with multiple layers of either `N->M` or `M->N`, those aliases tend to manifest as apparent "non-linearities".

So, I guess looking from the bottom up the system may look non-continuous and linear. But if you look from the top down, it would look continuous and non-linear.

Really, I am not sure which one is "true".

link