Hacker News new | ask | show | jobs
by tel 2258 days ago
Many frameworks already compute derivatives, but they don't use a symbolic representation. Instead they use a method called "automatic differentiation" which does something along the lines of (a) extracts a trace of the algorithm by executing the code with dummy arguments, then (b) uses the chain rule to compute component derivatives at each node in the execution tree and combine them into the final answer.

These methods are much faster than perturbation-based derivatives and much more applicable than symbolic methods (which cannot be automatically extracted from a program).

1 comments

Not sure what you mean by “automatically extracted from a program”, all DL frameworks manually write backward pass for each op.
I mean the tracing operation that produces a structure appropriate for AD computation. I agree with you that there's work needed to specify the node derivatives.

Although, honestly, I misspoke. The difference between AD and symbolic differentiation is more subtle. Really AD is profiting because it uses AST representations to keep a graph of intermediate values while symbolic methods can blow up exponentially (or require clever, difficult to generalize tricks to reconstruct that graph).