| > You have a stochastic estimate since your loss is generally additive in the data. Just to be clear, additive loss doesn't imply stochastic gradient estimate. Rather, because the loss function is additive, then stochastic gradient estimates of the loss are now possible. But, this of course does not mean one has to use stochastic gradient estimates. It's just that it's easier to update and monitor progress this way, rather than computing the gradient term for every single example in the training set and then taking a descent step. The surprising thing is that stochastic gradient descent convergences quickly in practice relative to proper gradient descent. All of the justification and whatnot for SGD for ML is largely post-hoc because it works so unreasonably well and is so intuitive to anyone having taken calculus. The other aspect (with respect to the context of optimization in machine learning) is that this optimization is performed over a loss over a training dataset for which you really don't even want convergence to an exact minima over the training loss. What you really care about is the expected generalization loss. Convergence to the exact minima over training loss doesn't necessarily guarantee the best generalization loss. I mention this because it contributes to the general aloofness towards optimization convergence rates in ML. > I believe many have investigated quasi-Newton methods based on estimate gradients but I haven’t investigated that thoroughly. Until semi-recently, quasi-newton was not explored in the stochastic setting because of the question of how to extend the Wolfe conditions to this arena. There's been a bit of work on this [1], but I don't think it's caught on outside of the optimization community (not that it necessarily should considering the points above). [1]: https://arxiv.org/abs/1401.7020 |
You also made an interesting comment about work not catching on outside of the optimization community - can you recommend some resources or websites to follow in order to see what the optimization community is working on? I've developed an interest in the area but don't really know where to go for "up to date" information.