Hacker News new | ask | show | jobs
by programjames 675 days ago
They're good for reinforcement learning. E.g. Cicero uses piKL which samples according to

p ∝ anchor_policy * exp(utility / temperature)

The utility is exactly the same as "energy". The article ignores entropy, but you can add in entropy regularization e.g. in soft actor-critic.