| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by petters 2392 days ago
	May be useful, but it seems to me that the reward function still is relatively easy to specify? Much of the difficulty in AI safety is due to specify what humans really want. Perhaps the AI can observe a human playing the game and learn a reward function?

3 comments

nopinsight 2392 days ago

A subarea of AI research focused on learning the reward function is Inverse Reinforcement Learning. Here’s an article on it:

Learning from humans: what is inverse reinforcement learning? https://thegradient.pub/learning-from-humans-what-is-inverse...

link

pde3 2392 days ago

The problem is very easy to solve if the reward function (avoid altering the green life patterns) is specified. The aim in SafeLife version 1.0 (future versions will add more safety problems) is to find an agent/architecture that naturally has conservatism with respect to side effects, without being told which particular side effects in particular are bad.

link

petters 2392 days ago

I see, thanks!

link

Jeff_Brown 2392 days ago

> Much of the difficulty in AI safety is due to specify what humans really want.

Much of the difficulty of programming (for someone else) is due to the same thing.

link