| HN Mirror

Scanning through the paper, I see this "We structure our policy into two subnetworks, one of which receives only proprioceptive information, and the other which receives only exteroceptive information. As explained in the previous paragraph with proprioceptive information we refer to information that is independent of any task and local to the body while exteroceptive information includes a representation of the terrain ahead. We compared this architecture to a simple fully connected neural network and found that it greatly increased learning speed."

It seems to me they do use neural nets. Proximal Policy Optimization is just a more novel way of optimizing them.