|
|
|
|
|
by igorkraw
1566 days ago
|
|
Could you highlight the difference between this and training a permutation invariant or equivariant policy network using standard supervised or RL methods? Assuming I also have a way of having an invariant/equivariant loss function |
|
- RL says, give me a reward and I'll give you its max.
- GFlowNet says, give me a reward and I'll give you all its modes (via p(x) \propto R(x)).
Yes you would ideally have a loss (well, a reward/energy) that is invariant and operates e.g. directly on the molecule rather than on some arbitrary ordering of the nodes.