Hacker News new | ask | show | jobs
by symstym 3378 days ago
The example code inline in the article just illustrates the basic idea of Evolution Strategies (ES), not their new work in applying ES.

The behavior of agents is determined by a "policy function". This function takes in inputs (e.g. what the agent sees) and outputs actions (e.g. what the agent does). The policy function has a set of internal parameters that determines the precise mapping from inputs to outputs.

In their work, they used a neural network as the policy function. The parameters are just all the weights of the network.

In a simple version, you start with some random weights for the NN. Then you make many copies of the network, each with a slight random variation made to the weights. For each of these altered networks, you use them to control an agent for a while, and see how well the agent performs during that trial period. Based on how well the different variations do during their trial runs, you adjust the weights of the network a small amount. You adjust the weights to be more similar to the variations that did well. Then you repeat the process indefinitely (generate new variations, test them, etc.).

1 comments

Very good sir ! Makes more sense now ! Thank you.