| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by ted12345 3234 days ago

I wonder if some of the things the guy says are misleading.

In the video, we see the bot "creep blocking." For those unfamiliar with dota, players can use the model of the unit they control to obstruct the movement of allied computer controlled units in order to gain a favorable position.

I suppose it's possible that over millions and millions of matches played against itself, the OpenAI bot "invented" this behavior for itself. But it seems more likely to me that the programmers "built that behavior in."

2 comments

popcorncolonel 3234 days ago

It would be pretty much impossible for the programmers to "build the behavior in" to the neural network, unless you mean training on supervised data or something.

link

visarga 3233 days ago

It's not impossible, it's called inverse reinforcement learning, where they learn a value function from an external demonstration. Then they use this value function for teaching the bot an action policy. Intuitively, the idea is to learn first what are a good state and a bad state, based on external demonstrations, then use that to teach the bot how to act.

This kind of learning is similar to GANs, where the discriminator learns from real data and the generator learns from the discriminator.

link

popcorncolonel 3229 days ago

Very interesting! Thanks for sharing -- I'll look more into this.

link

mquander 3234 days ago

Given that they said explicitly that the bot invented all of its behavior for itself from scratch, it seems more likely to me that it did so.

link