Hacker News new | ask | show | jobs
by ziofill 36 days ago
Why? It should get the reward for getting longer, but not for getting longer quicker
1 comments

Because the sessions would last forever. Think of a 1 or 2 length snake, figuring out that left down up right over and over again doesn't lose any points. You're now trapped in a local minimum. You need to make the AI get impatient (lose points) or it'll never learn.
I see what you are saying but then wouldn’t it miss out on the best strategies, which do require patience and not going straight for the apple?
Maybe you could make it lose points for repeating a board state, I guess.
Ah yes that’s probably the right approach