Hacker News new | ask | show | jobs
by big-chungus4 752 days ago
Julia's autodiff packages, as well as PyTorch, can differentiate through branching code -the gradients simply flow through whatever branch was used in forward pass. However, derivatives with respect to conditional values, such as `a` in `if i > a`, are mathematically zero. If you plot a graph of how function value depends on a conditional `if i > a`, it is flat with a single drop when `i` becomes bigger than `a`. DiscoGrad, on the other hand, doesn't use true mathematical derivatives, instead it calculates useful, smoothed gradient approximations for those conditionals.