| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by currymj 2399 days ago
	this is how most of the constrained MDP stuff effectively works, it’s not a bad intuition that it is just different kinds of reward shaping. in some approaches you write down the Lagrangian of the RL reward-maximizing problem and then the hard constraints become (perhaps infinitely strong) soft penalties.