|
|
|
|
|
by sepranu
2949 days ago
|
|
The points the author makes about gradient descent are accurate, in a sense. However, they oversimplify the technique (as it is currently applied today) and the context in which it is used. It seems as if the author, like many others, has a basic understanding of the subject's basic mechanisms, but not the context in which experts understand them. The example the author cites regarding evo algorithms learning physical laws is laughable - "It's just not in the data - it has to be invented" applies equally to both the backprop and the evolutionary learning algorithms. "In this case, the representation (mathematical expressions represented as trees) is distinctly non-differentiable, so could not even in principle be learned through gradient descent." This is incorrect, almost like saying NLP data is not differentiable. For instance, set this representation up as the output of a network (or, if you wanted to be fancier, the central component of an autoencoder), and see how well it predicts/correlates with the experimental data. This is the error, which is back-propagated through the network's nodes. FWIW, many theoreticians believe that the unreasonable effectiveness of neural networks and especially transfer learning is a result of their well-suitedness to encode laws of physics and Euclidean geometry. The author's final points about a nine-year-old survey may be out of date w.r.t. contemporary neural networks, which often have spookily good local minima and do not behave in the way intuition about gradient descent might suggest. |
|