|
|
|
|
|
by waldarbeiter
977 days ago
|
|
When the policy is deployed in the real world only the depth camera is used no waypoints etc.. Scan dots and target heading is used in the first Phase of the training to pretrain a policy in simulation. In Phase 2 a policy is trained end-to-end using the pretrained actor network: "First, exteroceptive information is only available in the form of depth images from a front-facing camera instead of scandots. Second, there is no expert to specify waypoints and target directions, these must be inferred from the visible terrain geometry." For policy training in Phase 2 DAgger which is based on Behavior Cloning is used (with the policy from Phase 1 as the expert), they also use some tricks to make sure no actions that are too different from the expert actions are executed during training. In Phase 2 the network learns to extract environment information instead of from the scan dots from the depth camera. Also in Phase 2 they use the pretrained actor network from Phase 1 but the depth embedding must be learned from scratch. This is how I understand it. |
|