|
|
|
|
|
by unishark
2123 days ago
|
|
> ... (X^T * X)^-1 ... This is the matrix inversion I was referring to. It's size (at best) depends on the smaller of the number of parameters and the amount of training samples. Both get very big in machine learning. When this happens you need to use some kind of low-memory iterative method like Greville's algorithm or even gradient descent itself. So you're ultimately not any better off. |
|