|
|
|
|
|
by beefnugs
354 days ago
|
|
yeahhhh why isnt there a training structure where you play 5000 games, and the reward function is based on doing well in all of them? I guess its a totaly different level of control: instead of immediately choosing a certain button to press, you need to set longer term goals. "press whatever sequence over this time i need to do to end up closer to this result" There is some kind of nested multidimensional thing to train on here instead of immediate limited choices |
|