|
|
|
|
|
by theapadayo
539 days ago
|
|
Well I guess we finally got the mythical 'Q*'. Or at least some variant of it using energy functions (I think that's what they mean by 'soft' Q-learning?). The extra boost from using the value function at test time is interesting as well. |
|