| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by lukehack 1964 days ago

The stack is basic. I develop on an old lenovo laptop and test for a few dozen frames(you can learn a lot without a CUDA GPU) before pushing it to a desktop with a cheap nvidia card. It uses pytorch and pyboy, and the model is just a couple Conv2d expansions and compressions before hitting a Linear layer outputting predicted reward for certain keypresses(basically). The model training is based off of deep Q learning. I'm looking at a pytorch tutorial[1] when I get stuck, but I'm trying to fumble around and try it myself as much as possible before looking at it.

I have an idea to have variable Q training propagation based on the amplitude of the reward so that bigger rewards propagate more, but I haven't got there yet.

Here is a great video on reinforcement learning[2].

[1] https://pytorch.org/tutorials/intermediate/mario_rl_tutorial...

[2] https://www.youtube.com/watch?v=93M1l_nrhpQ&t=3381