|
|
|
|
|
by baron_harkonnen
1719 days ago
|
|
Given the current state of automatic differentiation I'm not so sure it's even necessary or particularly useful to focus on backpropagation any more. While backprop has major historic significance, in the end it's essentially just a pure calculation which no longer needs to be done by hand. Don't get me wrong, I still believe that understanding the gradient is hugely important, and conceptually it will always be essential to understand that one is optimizing a neural network by taking the derivative of the loss function, but backprop is not necessary nor is it particularly useful for modern neural networks (nobody is computing gradients by hand for transformers). IMHO a better approach is to focus on a tool like JAX where taking a derivative is abstracted away cleanly enough, but at the same time you remain fully aware of all the calculus that is being done. Especially for programmers, it's better to look at Neural Networks as just a specific application of Differentiable Programing. This makes them both easier to understand and also enables the learner to open a much broader class of problems they can solve with the same tools. |
|