Hacker News new | ask | show | jobs
by jefft255 2399 days ago
That works but to learn to avoid these "bad" things, in the setting you describe, the agent has to first make those mistakes and learn from them. There are mistakes we don't want the agent to make, ever. That's what safe RL is about.