Hacker News new | ask | show | jobs
by superpermutat0r 2258 days ago
Function getting differentiated regularly is a loss function defined as (DesiredOutput(x) - HugeNumberOfParametersAppliedTo(x))^2. Are you saying that the symbolic expression gets transformed and is then used to represent the gradient?

I thought that PyTorch, Tensorflow and similar already do that.

1 comments

Many frameworks already compute derivatives, but they don't use a symbolic representation. Instead they use a method called "automatic differentiation" which does something along the lines of (a) extracts a trace of the algorithm by executing the code with dummy arguments, then (b) uses the chain rule to compute component derivatives at each node in the execution tree and combine them into the final answer.

These methods are much faster than perturbation-based derivatives and much more applicable than symbolic methods (which cannot be automatically extracted from a program).

Not sure what you mean by “automatically extracted from a program”, all DL frameworks manually write backward pass for each op.
I mean the tracing operation that produces a structure appropriate for AD computation. I agree with you that there's work needed to specify the node derivatives.

Although, honestly, I misspoke. The difference between AD and symbolic differentiation is more subtle. Really AD is profiting because it uses AST representations to keep a graph of intermediate values while symbolic methods can blow up exponentially (or require clever, difficult to generalize tricks to reconstruct that graph).