| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by sangfroid_bio 2222 days ago
	How does the differentiable programming implementation work?

1 comments

tlb 2222 days ago

Yoga is purely functional, so it's possible to use backpropagation to compute it efficiently.

It actually compiles 2 versions of each function: call them foo and foo.grad. foo.grad takes the same arguments as foo, and also a gradient for each output argument. It then computes gradients for each input argument.

The algorithm is simple: traverse the expression tree in the usual order you'd use for emitting code, and remember the order. Then traverse in reverse order, propagating gradients as you go.

The tedious bit is writing the gradients for every built-in op. For an operation like + it's simple: each argument gets the same gradient as the result:

  void ExprAdd::backprop(YogaContext const &ctx, GradientExprSet &grads, ExprNode *g)
  {
    grads.addGradient(ctx, args[0], g);
    grads.addGradient(ctx, args[1], g);
  }

For something like divide it's a bit grosser:

  void ExprDiv::backprop(YogaContext const &ctx, GradientExprSet &grads, ExprNode *g)
  {
    // https://en.wikipedia.org/wiki/Quotient_rule
    grads.addGradient(ctx, args[0], ctx.mkExpr<ExprDiv>(g, args[1]));
    grads.addGradient(ctx, args[1], ctx.mkExpr<ExprNeg>(
      ctx.mkExpr<ExprMul>(g,
        ctx.mkExpr<ExprDiv>(
          args[0],
          ctx.mkExpr<ExprPow>(
            args[1],
            ctx.mkExpr<ExprConstDouble>(2.0))))));
  }

More at https://gitlab.com/umbrellaresearch/yoga2/-/blob/master/jit/... if you're curious.

link

sangfroid_bio 2222 days ago

Have you considered using an off the shelf differentiable programming implementation? Or are the requirements for real-time applications too demanding for existing software?

link

tlb 2222 days ago

I looked at a few systems but didn't think any of them would work.

The real-time requirement is fairly hard. The robots I work on use a 1 kHz feedback loop, so it has 1 mS in which to recalculate everything.

Caffe is pretty efficient if you can supply batches of values, but the overhead is high when running real-time.

link

EquallyJust 2222 days ago

have you tried torchscript w/ the C++ interface?

link

tlb 2222 days ago

No. It looks promising. I should try it.

link

sangfroid_bio 2222 days ago

I think Tensorflow with Swift or Rust bindings may also work.

link

mistrial9 2221 days ago

has umbrellaresearch considered changing yoga to a different name ? start with my +1

link