Hacker News new | ask | show | jobs
by nrmn 2973 days ago
From my viewpoint, as both a researcher and someone who has built frameworks around environments/games:

- Each step within the game has to be extremely fast. I.e the game should be able to be run as fast as the machine allows while keeping physics etc. consistent.

- Runnable via library import such that there is no drawing to the screen.

- Should be easy to reset the environment to an initial state.

- RNG state should be seedable.

- I highly recommend supporting an identical interface found in OpenAI's gym. Check their docs out. Even better would be to have your game importable as an environment in gym.

- Configurable screen resolution would be great (eg. output 120x100)

- The environment is "hackable" eg. the maps or levels can be modified or loaded say via some ascii map.

- Should support multiple copies of the game running at once.

- A nice to have would be if the current environment state could be exported and loaded later.

- Expose some information/signals such that a reward signal can be created. Or better yet you define one as the game creator.

3 comments

Excellent list.

> - Should be easy to reset the environment to an initial state.

Adding on to that, the ability to rewind the game state is a pretty big deal.

The biggest deal for AI researchers though is that you implement a replay function and format, and publish lots of tooling around them (to read and parse them, etc; at least in Python).

Also, if it's an online game, save the replays serverside and publish them somewhere. Kaggle will be happy to take it I'm sure.

> - I highly recommend supporting an identical interface found in OpenAI's gym. Check their docs out. Even better would be to have your game importable as an environment in gym.

> - Configurable screen resolution would be great (eg. output 120x100)

I think both of these things assume you are going to be doing RL from pixels. I think to support a wider variety of RL/control research, you should be able to get the game state in a structured form and not just a flat vector the way gym does it.

But even then, that's still just one branch of AI research. I've seen people optimize how games behave to optimize engagement with the game, and in that setting just controlling the player is not enough. The work I saw looked at controlling level progression to increase engagement, but you could imagine controlling other bits of the game, particularly relevant if your game is not symmetric and the metric you care about is not just making the best AI.

Maybe not AI, but people also do research on how to replace components of games with ML components and the results can be pretty cool, e.g. https://www.youtube.com/watch?v=Ul0Gilv5wvY

Which is just to say that there is not one size fits all approach here.

May I ask what do you mean by "RNG state should be seedable"?
If the game depends on random events (eg an attack does random damage between 3-8) it would be useful to make sure it's always the same randomness, if you want it at least.
Same randomness? I can't get gist of the term.
In addition to the other explanation, check out today's NYT article on how one guy cracked the lottery because of pseudo-random behaviour in the lottery code.

https://www.nytimes.com/interactive/2018/05/03/magazine/mone...

Most random sources are PRNG rather than 'true' random sources, and sometimes it's useful (for debugging, for analysis or just for interest) to be able to use a predictable pattern of otherwise random numbers.

One way is to allow some way of 'seeding' the PRNG such that the order of the numbers it produces is the same each time, as we return the random function back to a known state.

Or, by example, if I make 5 calls to the PRNG with seed value '0' and see the following: [5, 2, 9, 18, 4, ...] and that causes the agent I'm testing to do something utterly weird, so I want to re-run my agent to observe the effect in detail to debug it, and for that to happen, I need the same [5, 2, 9, 18, 4, ...] sequence, otherwise I'll be forced to run repeatedly until I observe the same glitch, so by re-seeding the PRNG to '0', it will then predictably return that sequence, rather than a new, random sequence.

It's because most of the randomness used by software is actually pseudorandom. What that means is that you actually use a defined sequence. The sequence has behaviour that's close enough to what you'd get if you were picking random samples from a distribution for the desired application.

The key difference is that it's reproducible and that if you have insight into the parameters of the sequence (e.g. the seed and the current position in the sequence), you can predict the results. That's why people often get upset when people use these pseudorandom number generators for security purposes.

The seed is a value that is used to generate the sequence. If you use the same seed, you get the same sequence.

Typically when you init a random generator, it'll let you pass a number in if you want to. That will set the sequence of "random" output from the generator; different seeds will be random with respect to each other. If you re-use the same seed you'll get the same sequence of "random" numbers as before. This is useful to test or re-try sequences involving "random" in a reproducible way.