Hacker News new | ask | show | jobs
by cs702 1563 days ago
I'm still wrapping my head around this (it seems rather magical; can it really be just that?)... but I find the idea of "sampling from a distribution of trajectories that reach a terminal state whose probability is proportional to a positive reward function, by minimizing the difference between the flow coming into the trajectories and the flow coming out of them, which by construction must be equal" to be both beautiful and elegant -- like all great ideas.

How did you and your coauthors come up with this? Trial and error? Or was there a moment of serendipitous creative insight?

--

To the moderators: My comment is now at the top of the page, but manux's comment above is more deserving of the top spot. I just upvoted it. Please consider pinning it at the top.

1 comments

I think the inspiration came to me from looking at SumTrees and from having worked on Temporal Difference learning for a long time. The idea of flows came to Yoshua and I from the realization that we wanted some kind of energy conservation/preservation mechanism from having multiple paths lead to the same state.

But, yes in the moment it felt like some very serendipitous insight!

Thank you. It's nice to hear that you had one of those shower/bathtub Eureka moments!

> ...we wanted some kind of energy conservation/preservation mechanism from having multiple paths lead to the same state

Makes sense. FWIW, to me this looks like a Conservation Law -- as in Physics. I mean, it's not that the flows "must be" conserved, but that they are conserved (or go into sinks). Any physicists interested in AI should be all over this; it's right up their alley.