I noticed snake gets penalized for not getting to the apple early, is that what you really want? Snake is about how long it gets not about the balance between length and wall clock time
Because the sessions would last forever. Think of a 1 or 2 length snake, figuring out that left down up right over and over again doesn't lose any points. You're now trapped in a local minimum. You need to make the AI get impatient (lose points) or it'll never learn.