Hacker News new | ask | show | jobs
by nextaccountic 796 days ago
This means that, without a learning procedure to direct Mario towards the end of the level, it can only reach the end by itself because the levels (and Mario's in-memory data structures in general) are pretty small, right?

Or rather, if there were tons of irrelevant state, it could always end up trapped somewhere and never actually complete a level even after centuries of fuzzing.

Something similar was tested in the Twitch Plays Pokemon [0] gaming experiment, but there the inputs appeared random but weren't actually random: there were "factions" that either tried to sabotage the run, or that tried to make it progress. Ultimately the majority of the players were cooperating to complete the game and this was a deciding factor to make the run succeed. Maybe fuzzing Pokemon can't complete the game, the way that TPP could (or reinforcement learning could).

[0] https://en.wikipedia.org/wiki/Twitch_Plays_Pok%C3%A9mon

1 comments

The space is large, it just turns out if you direct Mario to explore with a bit of bias (so, in general, there's some favoring of exploring from states where Mario's x coordinate is to the right, e.g.) it completes the levels.

I think Pokemon could be beaten with our techniques. Final Fantasy on NES poses similar problems to Pokemon, and that is a game at which some progress has been made in the past, here.