Hacker News new | ask | show | jobs
by waldarbeiter 977 days ago
When the policy is deployed in the real world only the depth camera is used no waypoints etc.. Scan dots and target heading is used in the first Phase of the training to pretrain a policy in simulation. In Phase 2 a policy is trained end-to-end using the pretrained actor network: "First, exteroceptive information is only available in the form of depth images from a front-facing camera instead of scandots. Second, there is no expert to specify waypoints and target directions, these must be inferred from the visible terrain geometry." For policy training in Phase 2 DAgger which is based on Behavior Cloning is used (with the policy from Phase 1 as the expert), they also use some tricks to make sure no actions that are too different from the expert actions are executed during training. In Phase 2 the network learns to extract environment information instead of from the scan dots from the depth camera. Also in Phase 2 they use the pretrained actor network from Phase 1 but the depth embedding must be learned from scratch. This is how I understand it.
1 comments

Thank you for your comment. What I don't understand is this: when the robot is in a new environment, how does it know where it's supposed to go? My understanding is that the training teaches the robot how to get to a position, but I didn't see anything about how to choose where to go (in "old AI" parlance, what could have been defined as planning).