|
|
|
|
|
by Tyr42
2399 days ago
|
|
I think the answer is that "just building in costs" is actually rather hard to get right. Check out how
Concrete Problems in AI Safety (Section 6 in particular is about safe exploration) https://arxiv.org/pdf/1606.06565.pdf Quote: In practice, real world RL projects can often avoid these issues by simply hard-coding an avoidance
of catastrophic behaviors. For instance, an RL-based robot helicopter might be programmed to
override its policy with a hard-coded collision avoidance sequence (such as spinning its propellers to
gain altitude) whenever it’s too close to the ground. This approach works well when there are only
a few things that could go wrong, and the designers know all of them ahead of time. But as agents
become more autonomous and act in more complex domains, it may become harder and harder to
anticipate every possible catastrophic failure. The space of failure modes for an agent running a
power grid or a search-and-rescue operation could be quite large. Hard-coding against every possible
failure is unlikely to be feasible in these cases, so a more principled approach to preventing harmful
exploration seems essential. Even in simple cases like the robot helicopter, a principled approach would simplify system design and reduce the need for domain-specific engineering |
|
Exactly. It is almost as if we need AI to resolve the problem of properly supervising AI's training. I was wondering if the solution would be to add to classic actor-critic system a third network called a supervisor. The difference between the critic and supervisor would be architecture and the goal of the supervisor would be avoidance of those "terrible" outcomes. Some experiments would have to be run to decide if this approach is viable or do we have to continue tweaking cost functions.
Regarding Safety Gym I'm not sure how what they are doing differs from simply hard coding into your training procedure a series of checks for probability of hitting disallowed states in next step. For example in their example of a robotic arm that is trained with humans around the hard coded algorithm could track people around the arm's work envelope and when some person is detected as approaching it gives the robot a cost penalty. Also, for this to result in trained avoidance of people the network would have to have sufficient inputs to detect people by itself.