Hacker News new | ask | show | jobs
by jonplackett 1816 days ago
Isn't this just a cock up with incentives? If they'd put a -100 score on dying it would have sorted itself out pretty quick.
7 comments

The issue with AI safety and unanticipated AI outcomes in general is that it’s always just a cock-up with incentives.

It’s easy to sort out in narrowly specified areas, but an extremely hard problem as the tasks become more general.

Even worse: if simulations are used, you now have two problems - formulating correct incentives and protecting against abusing flaws in the simulation.
Isn’t this true about all systems, not just “AI”? The definition of a software bug is an unintended behavior. In a large system, myriad intents overlap and combine in unexpected ways. You might imagine a complex enough system where the confidence that a modification doesn’t introduce an unintended behavior is near zero.
I think it’s true for many systems, not just AI that’s true.

AI is worth calling out in this regard because, if the field is successful enough, it can create dangerous systems that don’t behave how we want.

Building a safe general AI is much harder than building a general AI, which is why it’s worth considering AI as it’s own problem domain.

While obviously I've got the advantage of hindsight here, it seems like it should not have taken three days of analysis to see why the wolves were committing suicide. It seems obvious once the point system is explained. Perhaps some rubber-duck debugging might have helped in this case.
I wonder if they initially thought it was a bug in the software, rather than a misalignment in the point system.
I think the point is more about highlighting the fact that AI doesn't share our base assumptions. We wouldn't think to put a huge penalty on dying because humans generally think that death is bad.
Yeah, because we have a -1000 points on death built-in.
We don't receive a penalty for dying. The difference between suicidal humans and suicidal AIs is that suicidal AIs keep respawning i.e. they are immortal.
Looking at genetic algorithms makes a great comparison. In essence any algorithm in which the wolf commits suicide doesn't make it to the next generation. It's the equivalent of an enormous score penalty and 100% analog to how it works for actual life.
Genetic algorithms are based on the same reward/cost function setup. They could easily arrive at the same conclusion because suicide might be the dominant strategy.
Yes, definitely. I misread the parent comment as snark claiming that we don't have that score penalty in reality
Humans don't put a huge penalty on dying. We discount it and assume/pretend that once we've had a good long life then death is okay and euthanasia is preferable to suffering with no hope of recovery. AI wolves that can live for 20 seconds are unwilling to suffer -1 per second with no hope of sheep.
Perhaps the PhD student wasn't trying to make an AI that wins at pac-man, but investigating something else. They mention "maximizing control over environment".
One of the most typical scenarios studied in those wolf/sheep models (like http://www.netlogoweb.org/launch#http://ccl.northwestern.edu... ) is to find the best conditions for "balance" between sheep and wolf: Too many wolves and the sheep go extint and later the wolf starve. Too many sheep and then the sheep don't get enough food and also die, taking the wolves with them..
Or social commentary on the nature of depression.

If you add your penalty, and a deficit of nearby sheep, you'd expect a trifurcation of strategy: hoarders that consume the nearby sheep immediately, explorers that bet on sheep further afield, and suicides from those that have evaluated the -100 penalty to still be optimal.

That same observation, with the exact same -100 points recommendation on crashing into a boulder, was indeed also made by a commentator on social media.
No, it's a cock up with the source of the wolves. If you could respawn endlessly after death would you fear it? You'd just want the stupid game to end before you lose points from the timer.
For clarification purposes:

Let's say you are a human player playing the wolf and sheep game. The score achieved in the game decides your death in real life. Note the stark difference. Dying in the game is not the same thing as dying in real life.

If there is an optimal strategy in the game that involves dying in the game you are going to follow it regardless of whether you are a human or an AI. By adding an artificial penalty to death you haven't changed the behavior of the AI, you have changed the optimal strategy.

The human player and the AI player will both do the optimal strategy to keep themselves alive. For the AI "staying alive" doesn't mean staying alive in the game, it means staying alive in the simulation. Thus even a death fearing AI would follow the suicide strategy if that is the optimal strategy.

It is impossible conclude from the experiment whether the AI doesn't fear death and thus willingly commits suicide or whether it fears death so much that it follows an optimal strategy that involves suicide.