|
|
|
|
|
by pmalynin
2948 days ago
|
|
Once again, there is a general conflation of evolutionary algorithms and learning without a differentiable error function. I've presented this argument again with a discussion with antirez here on HN [1], but the crux of is that Reinforcement Learning as it stands is optimization over (maybe)-non-differential errors. For AlphaGo there is no gradient, per se, that says you will optimize your wins if you go this way (now, it is optimized by training towards the win-rate "score", which could be an error score) -- look at REINFORCE for other variations. Evolutionary Learning and Reinforcement Learning as two sides of the same coin. [1] https://news.ycombinator.com/item?id=16652138 |
|