|
|
|
|
|
by symstym
3378 days ago
|
|
The example code inline in the article just illustrates the basic idea of Evolution Strategies (ES), not their new work in applying ES. The behavior of agents is determined by a "policy function". This function takes in inputs (e.g. what the agent sees) and outputs actions (e.g. what the agent does). The policy function has a set of internal parameters that determines the precise mapping from inputs to outputs. In their work, they used a neural network as the policy function. The parameters are just all the weights of the network. In a simple version, you start with some random weights for the NN. Then you make many copies of the network, each with a slight random variation made to the weights. For each of these altered networks, you use them to control an agent for a while, and see how well the agent performs during that trial period. Based on how well the different variations do during their trial runs, you adjust the weights of the network a small amount. You adjust the weights to be more similar to the variations that did well. Then you repeat the process indefinitely (generate new variations, test them, etc.). |
|