|
|
|
|
|
by currymj
2399 days ago
|
|
this is how most of the constrained MDP stuff effectively works, it’s not a bad intuition that it is just different kinds of reward shaping. in some approaches you write down the Lagrangian of the RL reward-maximizing problem and then the hard constraints become (perhaps infinitely strong) soft penalties. |
|