|
|
|
|
|
by cs702
1563 days ago
|
|
I'm still wrapping my head around this (it seems rather magical; can it really be just that?)... but I find the idea of "sampling from a distribution of trajectories that reach a terminal state whose probability is proportional to a positive reward function, by minimizing the difference between the flow coming into the trajectories and the flow coming out of them, which by construction must be equal" to be both beautiful and elegant -- like all great ideas. How did you and your coauthors come up with this? Trial and error? Or was there a moment of serendipitous creative insight? -- To the moderators: My comment is now at the top of the page, but manux's comment above is more deserving of the top spot. I just upvoted it. Please consider pinning it at the top. |
|
But, yes in the moment it felt like some very serendipitous insight!