|
|
|
|
|
by rohitarondekar
5023 days ago
|
|
In computing (X^TX)^(-1) if the number of features is large then it can be slow as computing the inverse of a matrix is slow. Also unless you use pseudo inverse (pinv in octave) you need to take care of degenerate cases. However if you use Regularization i.e replace the (X^TX)^(-1) with (X^TX + lambda*W)^(-1),
where lambda is the regularization parameter
and W is a matrix of the form: |0 0 0|
|0 1 0|
|0 0 1|
i.e identity matrix with (0,0) set to 0This ensures that the matrix is now invertible. Regularization takes care of overfitting. P.S I'm a ml n00b doing Machine Learning course on Coursera so I might be unaware of more practical knowledge of the above. :D |
|
Note that your W does not guarantee invertability - e.g., if your original (0,0) is already 0.