| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by tadkar 2038 days ago
	And would the 2 memory algorithm be equivalent to a gradient descent with momentum? I used to know what a sub gradient was, but I think there must be something more to the ideas in the paper because I’m struggling to see the analogy between gradient descent where you take steps probabilistically and the algorithm described. Perhaps I need to think about how you could potentially recast the quantile estimation problem as an optimisation problem and then apply what is effectively the machinery developed the train neural nets. Very interesting connection!

2 comments

stellalo 2038 days ago

Recasting quantile estimation as an optimization problem is trivial: the q-quantile minimizes the “pinball” loss (see first eqn in http://statweb.stanford.edu/~owen/courses/305a/lec18.pdf) with parameter q. What they do in the paper is to take subgradient steps with respect to the latest observation (just think about subgradients as gradients, since the loss function is everywhere differentiable except for one point)

link

zaroth 2037 days ago

I hate it when the complexity of the lingo dramatically exceeds the complexity of the algorithm. Language shouldn’t be the barrier to understanding.

This seems to be particularly true in computer learning. We’re taking about a conditional step function here, right?

link

eru 2037 days ago

The lingo is complex here, because it's general enough to be used for much more complicated cases.

Think of it as a 'hello world' program. The typically 'hello world' program in eg Java teaches you more about the lingo of Java than about solving the problem of putting 'hello world' on the screen.

(Of course, there are still plenty of bad reasons to describe simple things in complex lingo. But the above is one good reason.)

link

stellalo 2038 days ago

Actually, it looks like in the paper something else is going on other than subgradient steps: there is some more randomization going on, that can prevent some steps from being taken. So yeah, there is a connection with online subgradient, but also more to it :-)

link

tadkar 2038 days ago

Thanks for the loss function reference! I wonder if there’s something waiting to be discovered here about doing gradient descent but only taking steps with some probability. Definitely something to think about, I can’t imagine this idea hasn’t been explored before. Thanks a lot for the insightful comments, I’ve definitely seen that work in a very new light after knowing about it for years!!

link

ppereira 2037 days ago

See quantile regression and hinge loss functions.

link