Hacker News new | ask | show | jobs
by sigmoid10 743 days ago
It's not a single event, it's more like a new general game state that was never seen during training. Imagine learning to play the violin really well and then someone changes the way acoustics permeate. It doesn't matter if you're a human or an ML algorithm, you're going to have a hard time playing like before.
1 comments

But something is wrong in the learning, because as a human NetHack player who has ascended, I can say that we don't play radically different on full moons. Yes, the random numbers go your way slightly more, but that's about it.

This tells me the algo is trying to hard to predict the game or learn a decent static strategy, rather than make situational decisions.

The issue is with werecreatures on a full moon. Most humans exposed to Western culture (likely all NetHack players) have heard of werewolves. I think it’s safe to say that everyone who has heard of werewolves knows they are most dangerous on a full moon. Even if you are a total NetHack beginner you know to avoid these monsters on the full moon. The game even helpfully reminds you of this information both by telling you about the full moon and by having a werewolf howl incessantly when it’s on the same level. However the game does not explicitly fill in the gap for you. It expects culture to do that.

The advantage of human common sense over machine learning models — at least when it comes to role-playing games — is that we carry around a ton of this cultural information. A model trained only on NetHack — not on broader culture or folklore/fairytales/mythology/fantasy — is simply not going to be aware of this link between full moons and specific monsters becoming more dangerous. So if it’s developed a fairly naïve strategy of just fighting or avoiding everything in its path based on a model of relative strength then it’s going to be tripped up when an outside event (the phase of the moon) upends that model.

I disagree. There are a lot of idiosyncrasies about monsters in NetHack such that getting good at NetHack is 99.9% about learning the NetHack world, not the real world. The werecreature game mechanic is no different than the POI or HALLU effect, so I don't think the AI needs any special knowledge. I

bet it comes down to how much memory the algorithm has, since the transformation might occur way later than being bitten, while most poison kills are fairly quick. The problem is NetHack requires you to have at minimum 1000 turns of memory to know when to pray. Even more if you want to keep track of where stuff was.

Tfa states that the agent was trained for points, and an other user states that some critters are a lot more dangerous during full moons.

Wouldn’t be very surprising if the agent hyper-optimised farming those critters for points. It would not be able to change strategy if the cost/benefit of that farming changed massively, so would now be performing significantly worse.