Hacker News new | ask | show | jobs
by TomAnthony 1801 days ago
Similar story of unexpected AI outcomes...

As part of my PhD research, I created a simplified Pac-Man style game where the agent would simply try to stay alive as long as possible whilst being chased by the 3 ghosts. The agent was un-motivated and understood nothing about the goal, but was optimising for maximising its observable control over the world (avoiding death is a natural outcome of this).

I spent sometime trying to debug a behaviour where the agent would simply move left and right at the start of each run, waiting for the ghosts to close in. At the last minute it would run away, but always with a ghost in the cell right behind it.

Eventually, I realised this was an outcome of what it was optimising for. When ghosts reached cross-roads in the world they would got left or right randomly (if both were same distance to catching the agent). This randomness reduced the agent's control over the world, so was undesirable. Bringing a ghost in close made that ghost's behaviour completely predictable.

9 comments

Yet another similar story. A side project of mine was building a rudimentary neural network whose weights were optimized via a genetic algorithm. The goal was operating top-down, 2D self-driving cars.

The cars' "fitness" function rewarded cars for driving along the course and punished them for crashing into walls. But evidently this function punished a little too severely: the most successful cars would just drive in tight circles and never make progress on the course. But they were sure to avoid walls. :)

I believe that tactic is called "kiting" and used by speedrunners?
Yeah, waiting for the ghosts to get close was a standard strategy I used back when I played lots of Pacman.

Having all the ghosts behind you gives you more control since they'll follow you in a line.

That the ghosts follow the player is what makes the game winnable. If they formed a grid and gradually closed-in, it would be impossible to escape.

Edit: What was unexpected in this case was that the system found a strategy the programmer didn't think of.

Yup! It's also used in other games a lot.

For example in EVE Online with a 1v1 fight two basic tactics are either Kite or Brawl. A kiter that can maintain range will beat a brawler. But a brawler that 'catches' a kiter will generally win.

Yes! Exactly - kiting. I didn't know the term but when I explained the behaviour I was seeing to a colleague they told me about this.
Another similar story, I remember reading about an AI that simply paused the game when it was about to die. I can actually remember doing something similar as a child.
Same story as one I shared 4 years ago. Seems to be the best tactic! https://news.ycombinator.com/item?id=14031932

Edit: don’t want to sound accusatory

No need to be accusatory. The stories are different, just the learned behavior is the same. And not very surprising, considering your story was pre-empted by Pac-Man speedrunners, who already discovered this technique, which they call "kiting".

You can see the paper OP wrote to confirm for yourself that their story is not the same as yours: https://uhra.herts.ac.uk/bitstream/handle/2299/15376/906989....

Hah - thank you for sharing!

That is very interesting that this emerged from two different approaches.

I published my result years back, and have never heard of this emerging elsewhere before!

Didn’t take it as accusatory [but thanks to child for sharing link :)].

"Completely predictable" is different from "This would minimize the probability of being fenced in by the four ghosts." no?
hm... "keep your friends close but your enemies closer" ...?
But try to make sure your enemies don't end up surrounding you?
Yeah, that is tricky. I believe that Constantinople once found out the hard way, and thus is now Istanbul.
I guess people just liked it better that way.
How did you measure control over the world?
The method was called 'empowerment'. Two ways to explain it...

From a mathematical perspective, we used Information Theory to model the world as an information theoretic 'loop'. The agent could 'send' a signal to the world by performing an action, which would change the state of the world; the state of the world was what the agent 'received'. This obviously relies on having a model of the world and what your actions will do, but doesn't burden the model with other biases.

Pore more colloquially, the agent could perform actions in the world, and see the resulting state of the world (in my case, that was the location of the agent and of the ghosts). Part of the principle was that changes you cannot observe are not useful to you.

In an active inference approach you would have the agent minimise surprisal. Choose the action that is most likely to produce the outcome you predicted.
The approach I used was similar. The idea of maximising observed control of the world means you seek states where you can reach many other states, but _predictably_ so. This comes 'for free' when using Information Theory to model a channel.
Do you have any reading you'd recommend related to this?

I naively thought it would be some kind of Kalman filtering of sorts but from what I gather in your words it doesn't even have to be "that" complicated, right?

edit: found your link to the paper in another post ( https://news.ycombinator.com/item?id=27749619 ), thanks!

What's the tradeoff between "delete all state in the world with 100% certainty" and "be able to choose any next state of the world with (100-epsilon)% certainty"?
In Information Theory, there is a concept of Channel Capacity. If a channel is defined as the probability of the output being s if you send a, across all possible values of a, then the Channel Capacity is the maximum amount of information you can communicate across this channel, measured in bits.

To achieve the Channel Capacity you need to find the optimum distribution across a - i.e. what set of signals maximises the information you can transmit on this channel. There are known algorithms for finding this distribution (e.g. Blahut-Arimoto).

Now if you model the world as a channel, where s represents the reachable states and a represents the actions the agent can take (and the channel, P(s|a), represents the dynamics of the world), you can calculate what actions allow you maximal control (in terms of states you can controllably reach).

More info in this paper: https://uhra.herts.ac.uk/handle/2299/15376

Why would this cause the net to avoid death? Do things keep moving after pacman dies?
A while ago, a very simple agent I made had to do tasks in the maze and evaluate strategies to reach them. I wanted it to have no assumptions about the world, so it started with minimum knowledge. Its first plan was to try to remove walls, to get to the things it needed.

It is a fun feeling when your own program surprises you.

It can depend on what the agent "sees" and how many time-steps away the "consequences" are. If the ghosts are so far away that any action will take t time-steps before consequences to the agent, the actions are pseudo-random because there is no reward to optimize on.

The number of outcomes in branching_factor^t (very large) makes the action-values at t=0 (where the agent chooses between two/three actions) almost uniform random.

Yes, you are right.

I experimented with different time horizons, mostly look 3-7 steps ahead.

In terms of the 'reward', that was implicit within the model - if the ghosts caught you, your ability to influence the state of the world dropped to 0.

this sounds interesting. can you link your research or paper?
Sure! The PDF is available here:

https://uhra.herts.ac.uk/handle/2299/15376