|
|
|
|
|
by niels_olson
3451 days ago
|
|
Karpathy's explanation in CS231n is great: for any computational unit that is a function, you can calculate a derivative. So propagating the derivative backward through the function, just aplly the chain rule. This is much with simpler functions, so understand your algorithm down to the smallest computable units in the graph. |
|