Hacker News new | ask | show | jobs
by theapadayo 539 days ago
Well I guess we finally got the mythical 'Q*'. Or at least some variant of it using energy functions (I think that's what they mean by 'soft' Q-learning?). The extra boost from using the value function at test time is interesting as well.