| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by TomAnthony 1850 days ago

Similar story of unexpected AI outcomes...

As part of my PhD research, I created a simplified Pac-Man style game where the agent would simply try to stay alive as long as possible whilst being chased by the 3 ghosts. The agent was un-motivated and understood nothing about the goal, but was optimising for maximising its observable control over the world (avoiding death is a natural outcome of this).

I spent sometime trying to debug a behaviour where the agent would simply move left and right at the start of each run, waiting for the ghosts to close in. At the last minute it would run away, but always with a ghost in the cell right behind it.

Eventually, I realised this was an outcome of what it was optimising for. When ghosts reached cross-roads in the world they would got left or right randomly (if both were same distance to catching the agent). This randomness reduced the agent's control over the world, so was undesirable. Bringing a ghost in close made that ghost's behaviour completely predictable.

9 comments

joek1301 1850 days ago

Yet another similar story. A side project of mine was building a rudimentary neural network whose weights were optimized via a genetic algorithm. The goal was operating top-down, 2D self-driving cars.

The cars' "fitness" function rewarded cars for driving along the course and punished them for crashing into walls. But evidently this function punished a little too severely: the most successful cars would just drive in tight circles and never make progress on the course. But they were sure to avoid walls. :)

yodelshady 1850 days ago

I believe that tactic is called "kiting" and used by speedrunners?

joe_the_user 1850 days ago

Yeah, waiting for the ghosts to get close was a standard strategy I used back when I played lots of Pacman.

Having all the ghosts behind you gives you more control since they'll follow you in a line.

That the ghosts follow the player is what makes the game winnable. If they formed a grid and gradually closed-in, it would be impossible to escape.

Edit: What was unexpected in this case was that the system found a strategy the programmer didn't think of.

NaturalPhallacy 1850 days ago

Yup! It's also used in other games a lot.

For example in EVE Online with a 1v1 fight two basic tactics are either Kite or Brawl. A kiter that can maintain range will beat a brawler. But a brawler that 'catches' a kiter will generally win.

TomAnthony 1850 days ago

Yes! Exactly - kiting. I didn't know the term but when I explained the behaviour I was seeing to a colleague they told me about this.

Retr0id 1850 days ago

Another similar story, I remember reading about an AI that simply paused the game when it was about to die. I can actually remember doing something similar as a child.

0110101001 1850 days ago

https://youtu.be/xOCurBYI_gY&t=15m10s

edejong 1850 days ago

Same story as one I shared 4 years ago. Seems to be the best tactic! https://news.ycombinator.com/item?id=14031932

Edit: don’t want to sound accusatory

jeremysalwen 1850 days ago

No need to be accusatory. The stories are different, just the learned behavior is the same. And not very surprising, considering your story was pre-empted by Pac-Man speedrunners, who already discovered this technique, which they call "kiting".

You can see the paper OP wrote to confirm for yourself that their story is not the same as yours: https://uhra.herts.ac.uk/bitstream/handle/2299/15376/906989....

TomAnthony 1850 days ago

Hah - thank you for sharing!

That is very interesting that this emerged from two different approaches.

I published my result years back, and have never heard of this emerging elsewhere before!

Didn’t take it as accusatory [but thanks to child for sharing link :)].

wildmanx 1850 days ago

"Completely predictable" is different from "This would minimize the probability of being fenced in by the four ghosts." no?

McMiniBurger 1850 days ago

hm... "keep your friends close but your enemies closer" ...?

lancengym 1850 days ago

But try to make sure your enemies don't end up surrounding you?

inglor_cz 1850 days ago

Yeah, that is tricky. I believe that Constantinople once found out the hard way, and thus is now Istanbul.

TheDauthi 1850 days ago

I guess people just liked it better that way.

TchoBeer 1850 days ago

How did you measure control over the world?

TomAnthony 1850 days ago

The method was called 'empowerment'. Two ways to explain it...

From a mathematical perspective, we used Information Theory to model the world as an information theoretic 'loop'. The agent could 'send' a signal to the world by performing an action, which would change the state of the world; the state of the world was what the agent 'received'. This obviously relies on having a model of the world and what your actions will do, but doesn't burden the model with other biases.

Pore more colloquially, the agent could perform actions in the world, and see the resulting state of the world (in my case, that was the location of the agent and of the ghosts). Part of the principle was that changes you cannot observe are not useful to you.

greenpresident 1850 days ago

In an active inference approach you would have the agent minimise surprisal. Choose the action that is most likely to produce the outcome you predicted.

TomAnthony 1850 days ago

The approach I used was similar. The idea of maximising observed control of the world means you seek states where you can reach many other states, but _predictably_ so. This comes 'for free' when using Information Theory to model a channel.

cmehdy 1850 days ago

Do you have any reading you'd recommend related to this?

I naively thought it would be some kind of Kalman filtering of sorts but from what I gather in your words it doesn't even have to be "that" complicated, right?

edit: found your link to the paper in another post ( https://news.ycombinator.com/item?id=27749619 ), thanks!

benlivengood 1850 days ago

What's the tradeoff between "delete all state in the world with 100% certainty" and "be able to choose any next state of the world with (100-epsilon)% certainty"?

TomAnthony 1850 days ago

In Information Theory, there is a concept of Channel Capacity. If a channel is defined as the probability of the output being s if you send a, across all possible values of a, then the Channel Capacity is the maximum amount of information you can communicate across this channel, measured in bits.

To achieve the Channel Capacity you need to find the optimum distribution across a - i.e. what set of signals maximises the information you can transmit on this channel. There are known algorithms for finding this distribution (e.g. Blahut-Arimoto).

Now if you model the world as a channel, where s represents the reachable states and a represents the actions the agent can take (and the channel, P(s|a), represents the dynamics of the world), you can calculate what actions allow you maximal control (in terms of states you can controllably reach).

More info in this paper: https://uhra.herts.ac.uk/handle/2299/15376

TchoBeer 1850 days ago

Why would this cause the net to avoid death? Do things keep moving after pacman dies?

Iv 1850 days ago

A while ago, a very simple agent I made had to do tasks in the maze and evaluate strategies to reach them. I wanted it to have no assumptions about the world, so it started with minimum knowledge. Its first plan was to try to remove walls, to get to the things it needed.

It is a fun feeling when your own program surprises you.

johbjo 1850 days ago

It can depend on what the agent "sees" and how many time-steps away the "consequences" are. If the ghosts are so far away that any action will take t time-steps before consequences to the agent, the actions are pseudo-random because there is no reward to optimize on.

The number of outcomes in branching_factor^t (very large) makes the action-values at t=0 (where the agent chooses between two/three actions) almost uniform random.

TomAnthony 1850 days ago

Yes, you are right.

I experimented with different time horizons, mostly look 3-7 steps ahead.

In terms of the 'reward', that was implicit within the model - if the ghosts caught you, your ability to influence the state of the world dropped to 0.

fnord77 1850 days ago

this sounds interesting. can you link your research or paper?

TomAnthony 1850 days ago

Sure! The PDF is available here:

https://uhra.herts.ac.uk/handle/2299/15376