Hacker News new | ask | show | jobs
by Steuard 750 days ago
I guess I'm not entirely clear on the background/context here. I gather that the tweet's author isn't a serious Nethack player himself, but he is trying to train a neural net to play the game, and his training system somehow takes a model created by someone else as a baseline and tries to fine tune it somehow? But despite Nethack being based on randomly generated dungeons, the other model gets a consistent score every time, somehow? But even though the reference system reliably gets the same score through all the randomization of the dungeon, the game's full moon mechanic somehow throws it off significantly.

I feel like I mostly understand most of the pieces of this story when taken individually, but I'm having trouble assembling most of them into a coherent whole.

3 comments

Their model is trained only on runs it has seen before which didn't include sufficient full moon runs. So its performance degraded when it encountered a sufficiently novel variant.

Which is where the I part of AI always falls down, input that sufficiently differs.

E.g., train facial recognition on a corpus of predominately white American faces, African Americans suffer a horribly high false positive rate when the cops use your model on surveillance footage.

I don't know exactly about this case, but when training models like this I tend to fix a small set, or even just one, seeds to use as a baseline 'quality measure' -- while this has the risk of over-tuning, always measuring quality using random seeds means you can misjudge a model's quality because you get particularly lucky, or unlucky, seeds.

However (and again I've hit this), sometimes you don't fix everything enough, and still have some unexpected variation, like in this case.

The score they're reporting is almost definitely the average over a set of fixed seeds I imagine. They just didn't realize that the seed is not sufficient to establish the play experience, the system clock is a factor too.