|
|
|
|
|
by peripitea
2403 days ago
|
|
Yes, that seems like an important problem, but one separate to what they're describing in OP's article. (Again, assuming I'm understanding this right.) Their constrained RL approach is still relying on our ability to enumerate and assign costs to the undesirable behaviors, right? From reading the article, I get the impression that they are focused on addressing that scenario, and leaving the problem of how to enumerate all undesirable behaviors to separate research. |
|