|
|
|
|
|
by tmyklebu
3093 days ago
|
|
> You mention Newton’s method, but of course that requires second order information which, as I mentioned, is not generally workable in high dimensions. Why would you say that second-order information is "not generally workable in high dimensions"? We regularly run Newton's method on problems with tens of millions of variables today. And Newton's method isn't the only way to use second-order information. It is easy to access, for example, Hessian-times-vector information using the same reverse-mode differentiation that's so popular today, using only a constant factor more time. > You have to be careful with quasi-Newton methods like conjugate gradient for the same reason. What reason, exactly, is that? |
|
Could you provide a reference to a 10^7 size problem that is being optimized with Newton’s method? I’d be indebted.