| HN Mirror

What the permutation invariance gets you is that the model doesn't arbitrarily prefer one (graph) configuration over another, but this seems tangential. The difference between this and RL is in what we do with the reward:

- RL says, give me a reward and I'll give you its max.

- GFlowNet says, give me a reward and I'll give you all its modes (via p(x) \propto R(x)).

Yes you would ideally have a loss (well, a reward/energy) that is invariant and operates e.g. directly on the molecule rather than on some arbitrary ordering of the nodes.