Hacker News new | ask | show | jobs
by SoerenL 2776 days ago
As one of the authors: Yes, it is true, 3 orders of magnitude (so about a factor of 1000 on GPUs). But please be careful, this holds only for higher order derivatives (like Hessians, or Jacobians), as stated in the paper and as the title says. For gradients, the speedup is rather limited.
3 comments

Can you briefly explain what the approach is? I’ve briefly skimmed the paper and I feel like I am missing something. Is it auto diff on the component wise expressions, then somehow figuring out a way to evaluate those expressions using matrices?
That's what is usually done, autodiff on the component wise expression. We don't do it here. Instead, we really compute on the matrix and tensor level and compute derivatives here directly. Let me give you a simple example to illustrate it: Consider the function f(x)=x'Ax. Then, its Hessian is A+A'. This expression is what we compute. And evaluating this expression is orders of magnitude faster than what TF, PyTorch, etc. do, namely autodiff on the components. Hope this helps, if not, please let me know.
Do you think this could end up being implemented in TF and Pytorch?
Not in its current formulation. It uses a different representation of the tensors. However, a new version/algorithm that will be available in a few months can be used in TF and PyTorch.
Are you referring to the Ricci notation when you are saying it uses a different representation of the tensors? Do you also plan to add non-differentiable functions like relu?
Yeah, I am referring to Ricci notation. TF and PyTorch don't use it. The new version already has relus. This version is targeted at standard formulas (has abs as a non-differentiable function), next version works for deep learning. Never wanted to work on this but there is just no way around it.
Can this be used for faster earth mover distance calculation?
Do you need Hessians or Jacobians for computing the earth mover distance, then a definite yes. Otherwise, I would doubt it (though I do not know exactly.)
Wait, isn’t Jacobian is just a bunch of gradients?