| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by SoerenL 2776 days ago
	As one of the authors: Yes, it is true, 3 orders of magnitude (so about a factor of 1000 on GPUs). But please be careful, this holds only for higher order derivatives (like Hessians, or Jacobians), as stated in the paper and as the title says. For gradients, the speedup is rather limited.

3 comments

woggy 2776 days ago

Can you briefly explain what the approach is? I’ve briefly skimmed the paper and I feel like I am missing something. Is it auto diff on the component wise expressions, then somehow figuring out a way to evaluate those expressions using matrices?

link

SoerenL 2776 days ago

That's what is usually done, autodiff on the component wise expression. We don't do it here. Instead, we really compute on the matrix and tensor level and compute derivatives here directly. Let me give you a simple example to illustrate it: Consider the function f(x)=x'Ax. Then, its Hessian is A+A'. This expression is what we compute. And evaluating this expression is orders of magnitude faster than what TF, PyTorch, etc. do, namely autodiff on the components. Hope this helps, if not, please let me know.

link

goolulusaurs 2776 days ago

Do you think this could end up being implemented in TF and Pytorch?

link

SoerenL 2776 days ago

Not in its current formulation. It uses a different representation of the tensors. However, a new version/algorithm that will be available in a few months can be used in TF and PyTorch.

link

jkam 2775 days ago

Are you referring to the Ricci notation when you are saying it uses a different representation of the tensors? Do you also plan to add non-differentiable functions like relu?

link

SoerenL 2773 days ago

Yeah, I am referring to Ricci notation. TF and PyTorch don't use it. The new version already has relus. This version is targeted at standard formulas (has abs as a non-differentiable function), next version works for deep learning. Never wanted to work on this but there is just no way around it.

link

nl 2776 days ago

Can this be used for faster earth mover distance calculation?

link

SoerenL 2776 days ago

Do you need Hessians or Jacobians for computing the earth mover distance, then a definite yes. Otherwise, I would doubt it (though I do not know exactly.)

link

p1esk 2776 days ago

Wait, isn’t Jacobian is just a bunch of gradients?

link