Hacker News new | ask | show | jobs
by beagle3 5022 days ago
Ok, that clears it up:

He doesn't need to set W(0,0) to 1 specifically because he sets x0 to 0 (which guarantees a non-zero value in the covariance matrix).

But the standard way to do L2 regularization (also known as "ridge regression") is to add a scaled identity matrix (the entire diagonal set to be nonzero)

1 comments

You mean set x0 to 1, right?

People who do linear regression at work don't add a x0 feature? During the lecture the prof. only said that adding a x0=1 for all samples m, is by convention and helps simplify the computation. Unless I missed something during the lecture that's the only explanation that was given.

Yes , I did, thanks.

> People who do linear regression at work don't add a x0 feature?

Sometimes they do that; sometimes the data already has a subset known to have sum 1 (e.g., if you binary variables that reflect "one of n choices" which must be set), and in this case adding x0=1 makes things worse (from a numerical perspective) for many algorithms.

Regardless, I've always seen regulation theory stated with lambda*identity matrices.