Hacker News new | ask | show | jobs
by est31 2399 days ago
The approach you describe is mentioned in the article as "normal RL". Constrained RL is an advanced mode of it where you are given direct control over how often some safety constraint should be violated. Basically constrained RL is just automating away the part where you are manually adjusting the "normal RL" punishments to fit your constraints.