|
|
|
|
|
by richard___
3255 days ago
|
|
Summary-- Semantics part: Seems like the idea is we can "transfer" knowledge from prior labeled samples so that we don't need to do as much new work labeling sample images with semantic labels. Grasping part: "Emulating human movements with self-supervision and imitation." High-level imitation based on visual frame differences avoids needing to manually control actuators. Not sure how this works exactly Two-stream model: ventral network asks What class, dorsal network asks Is this how we should grasp this object. The benefit is that we can make use of all the automatically generated (robot-generated) grasping data without having a human supervise all that automated grasping, e.g. "This process is a successful way to pickup this object, and also this object is an apple." The ventral network ties back this the grasping data (without object labels) to object labels, which allows for semantic control of the trained robot e.g. "Pickup that apple". |
|