| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by beefnugs 354 days ago

yeahhhh why isnt there a training structure where you play 5000 games, and the reward function is based on doing well in all of them?

I guess its a totaly different level of control: instead of immediately choosing a certain button to press, you need to set longer term goals. "press whatever sequence over this time i need to do to end up closer to this result"

There is some kind of nested multidimensional thing to train on here instead of immediate limited choices