Hacker News new | ask | show | jobs
by peripitea 2403 days ago
Yes, that seems like an important problem, but one separate to what they're describing in OP's article. (Again, assuming I'm understanding this right.) Their constrained RL approach is still relying on our ability to enumerate and assign costs to the undesirable behaviors, right? From reading the article, I get the impression that they are focused on addressing that scenario, and leaving the problem of how to enumerate all undesirable behaviors to separate research.
1 comments

Constrained RL is a way to say "thou shalt not murder", instead of saying "murder is utility -10000".