Hacker News new | ask | show | jobs
by stellalo 2038 days ago
Actually, it looks like in the paper something else is going on other than subgradient steps: there is some more randomization going on, that can prevent some steps from being taken. So yeah, there is a connection with online subgradient, but also more to it :-)
1 comments

Thanks for the loss function reference! I wonder if there’s something waiting to be discovered here about doing gradient descent but only taking steps with some probability. Definitely something to think about, I can’t imagine this idea hasn’t been explored before. Thanks a lot for the insightful comments, I’ve definitely seen that work in a very new light after knowing about it for years!!
See quantile regression and hinge loss functions.