| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by stetrain 743 days ago

The thing it has learned previously needs to apply to the next run.

If you train an ML model on thousands of attempts at going around some racetracks where touching the walls slows you down, and the score is achieved by executing a fast lap, and the inputs to the model include where the car is and where the walls are, it should optimize towards avoiding touching the wall.

This behavior would likely still work even on new procedurally generated tracks that the model had not previously seen, as long as the relationship of inputs (car, walls) to desired behavior (fast lap) still applied.

If every N number of runs for a large value of N the game changes so that the walls are actually speed boosts and the center of the track slows you down, and there is no input to the ML model to tell it that the situation is different, it will initially try the previous strategy and perform worse, and it will be difficult to train it to handle both versions of the game without some discriminating input value to train on.