Hacker News new | ask | show | jobs
by rhdunn 749 days ago
I recall some Minecraft tests being saved worlds with redstone logic that will light a beacon green if it is working or red if not. That's usefull for games like that.

For games like Starcraft 2 with replay functionality, you could probably record/use several matches and test that the behaviour matches the recorded behaviour. If you can make your game have a replay feature you can make use of this, even if you don't ship that replay code.

For things like CYOA type games or decision trees, you could have a logging mechanism that prints out the choices, player stats, hidden stats, etc. and then have a way to run through the decisions, then check the actual log output against the expected output. -- I've done something similar when writing parsers by printing out the parse tree (for AST parser APIs) or the parse events (for reader/SAX parser APIs).

I'm sure there are other techniques for testing other parts of the system. For example, you could test the rendering by saving the render to an image and comparing it against an expected image. IIRC, Firefox does something similar for some systems like the SVG renderer and the HTML paint code.

Various of these features (replay, screenshots) are useful to have in the main game.

1 comments

You're right about parts, which are mostly state machines. The have a defined input and output. Tests are straightforward to implement and adjust.

But recording and replaying matches? Taking screenshots and comparing the output? Just think about it: If you have recorded a match and change the hitpoints of a single creature, the test could possibly fail. And then? Re-record the match?

The same applies to screenshots: What happens if models, sprites or colors change?

In my experience, tests like this are annoying, because:

1) They take a long time to create and adjust/recreate.

2) They fail for minor reasons.

3) It takes time to understand, what such tests even measure, if someone else made them.

4) You need a large, self made framework to support such tests.

5) It takes a long time to run them, because they are time dependent.

6) They hinder you to make large changes.

7) It's cheaper to make some low wage game testers play your game. Or better, make the game early access and let 1000s of players test your game for free, while even making money out of them

Yes, when you are trying to intentionally change the output, you simply regenerate the gold file to be used as reference (and yes, it should be easy). It’s brittle for sure but it does catch unintentional changes and should be used where relevant (if sparingly). There are definitely existing frameworks that do this (eg Jest calls this snapshot testing and has tooling to make it easy).

I’m sorry your experiences with this kind of stuff have been bad. I’ve generally had good experiences in the machine learning space where we used it judiciously where appropriate but didn’t overdo it.

I don’t see how it can ever hinder you though - you can always choose to go “I don’t care that the output has changed dramaticallly - it’s the new ground truth” as long as you communicate that’s what happening in your commit. What it doesn’t let you do is that the output is different every time you run it but that’s generally a positive (randomness should be intentionally injected deterministically).