Hacker News new | ask | show | jobs
by stetrain 743 days ago
Humans train simultaneously as they operate, and humans can see the message about the full moon.

If nobody includes the full moon message as input to the ML model, and tries to operate the ML model with the training it has achieved running in non-full-moon mode, its operating score in full-moon-mode may be lower.

Even if it had proportional training time against full-moon-mode to incorporate that into the model, if you don't tell it when full-moon-mode is active wouldn't the optimal behavior be to optimize the score for 27/28 days vs 1/28 days of the month?

If full-moon-mode is an input to the model, then it can trained to optimize for both scenarios.

2 comments

https://nethackwiki.com/wiki/Time

I predict the next "annoying non-bug" will be Friday, June 13th of 2025.

So for ML to work, it has to know all permutations of a game? Does that mean ML is useless for non-deterministic games with random outcomes or procedural generation?
The thing it has learned previously needs to apply to the next run.

If you train an ML model on thousands of attempts at going around some racetracks where touching the walls slows you down, and the score is achieved by executing a fast lap, and the inputs to the model include where the car is and where the walls are, it should optimize towards avoiding touching the wall.

This behavior would likely still work even on new procedurally generated tracks that the model had not previously seen, as long as the relationship of inputs (car, walls) to desired behavior (fast lap) still applied.

If every N number of runs for a large value of N the game changes so that the walls are actually speed boosts and the center of the track slows you down, and there is no input to the ML model to tell it that the situation is different, it will initially try the previous strategy and perform worse, and it will be difficult to train it to handle both versions of the game without some discriminating input value to train on.

Well that is the whole trick. ML models ideally generalize from the training inputs to whatever new inputs show up during inference. For example, a vision model should recognize an image of a dog as a dog even if that exact image was not trained on. But that generalization always has limits. Usually score will decrease substantially the further "out-of-domain" the inputs are. So this model works fine when running a randomly generated dungeon it has never seen, but not when running a set of game rules it has never seen.
It still worked, just not as well. The program was trained, built it's intuition for the game. It had a set script and assumptions.

Some of those assumptions were different, and since it's not learning/training it couldn't adjust for those new assumptions, so it didn't do as well

If you a human, were forced to follow a set script/assumptions the same would happen to you.