Hacker News new | ask | show | jobs
by petters 2392 days ago
May be useful, but it seems to me that the reward function still is relatively easy to specify? Much of the difficulty in AI safety is due to specify what humans really want.

Perhaps the AI can observe a human playing the game and learn a reward function?

3 comments

A subarea of AI research focused on learning the reward function is Inverse Reinforcement Learning. Here’s an article on it:

Learning from humans: what is inverse reinforcement learning? https://thegradient.pub/learning-from-humans-what-is-inverse...

The problem is very easy to solve if the reward function (avoid altering the green life patterns) is specified. The aim in SafeLife version 1.0 (future versions will add more safety problems) is to find an agent/architecture that naturally has conservatism with respect to side effects, without being told which particular side effects in particular are bad.
I see, thanks!
> Much of the difficulty in AI safety is due to specify what humans really want.

Much of the difficulty of programming (for someone else) is due to the same thing.