|
|
|
|
|
by dojomouse
2875 days ago
|
|
The main reasons they don't do this are that it's a fairly known quantity from an ML perspective (going from sequences of images to representational features), so wouldn't be proving that much to be able to do (c.f. the various Atari benchmarks which adequately learned actions to achieve rewards working with pixel inputs)... but at the same time would consume a huge fraction of the computer resource they really want to be targeting at the core timing/tactics/strategy problems... which is where they're really going beyond what's been demonstrated elsewhere with RL. I agree it'll be even cooler when it all justworkstm end to end, but in terms of incremental 'holyshiticantbelievethatworked' this is at least as big a step as it will be when they add in direct visual input. |
|
One of the next significant moments could be taking the current Dota 2 algorithm and massaging it to use human style inputs and outputs. Please correct if needed, but the current Dota 2 algorithm boils down to (1) a fully connected network that generates an input state vector from the Dota 2 bot output interface, (2) an LSTM of sufficient length that generates an output state vector from the input state vector, and (3) another fully connected network that generates the Dota 2 bot interface inputs from the output state vector. This could be updated to have (1a) a convolutional network that feeds into a fully connected network, where the input to the convolutional network is the frame buffer (and perhaps the audio output) and the output of the fully connected network is the input state vector, (2) the same or similar LSTM network, and (3a) a fully connected network that outputs keyboard and mouse commands instead of DotA 2 bot interface inputs.
It is an open question as to whether current compute power is sufficient for this massage.