| HN Mirror

I'd say getting better sample efficiency is a bigger deal. It isn't like POMDP's are a huge step away theoretically from MDP's. But if you attach one of these things to a robot, taking 10^7 samples to learn a policy is a deal breaker. So fine, please keep using games to research with.