|
|
|
|
|
by petters
2392 days ago
|
|
May be useful, but it seems to me that the reward function still is relatively easy to specify? Much of the difficulty in AI safety is due to specify what humans really want. Perhaps the AI can observe a human playing the game and learn a reward function? |
|
Learning from humans: what is inverse reinforcement learning? https://thegradient.pub/learning-from-humans-what-is-inverse...