Hacker News new | ask | show | jobs
Automatic Differentiation in Machine Learning: A Survey (2018) [pdf] (jmlr.org)
83 points by lambdamore 2664 days ago
2 comments

I'm trembling with excitement at the prospect that easy-to-use high-performance automatic differentiation looks likely to become a "must have" capability for more and more computer languages. It's going to become easier and easier to specify objective functions and have the computer optimize programs for a wider range of application.
I work on non-linear optimization problems for most of my work projects, and I know exactly what you mean. One of these years we'll be freed from remembering calculus I :)
right after the year of the Linux desktop, eh?
The idea of duality is insane. Its present in linear logic, automatic diff, constructive mathematics, probability, quantum theory, game design, discrete optimization, etc.

I wonder if it's related to chirality.

I think characterizing duality in this way is kind of superfluous, because the only way all those meanings of duality are the same is in the most abstract sense of the word.

In other words, lots of things have duals. But the duality between any given pair of things doesn't necessarily expose any deep, fundamental connection to another pair of things which have duality. So it's not that duality features so heavily throughout mathematics as its own concept; rather, we frequently build new theories to tie these things together. It's helpful to be able to translate things from one context to another context.

We could just as easily say that isomorphisms are insane because they feature heavily throughout mathematics. But I don't think that provides a deep insight, because it's not like an isomorphism is a special property that ties a bunch of mathematics together in a grand way. Specific pairs of things can be isomorphic. Likewise specific pairs of things can be duals.

Any given pair of dual things is its own duality. It doesn't necessarily have anything to do with the way another pair of objects is in duality. The terminology here is semantically convenient for intuition, but it's definitely overloaded. I think the commonalities you're seeing here are simply due to the vast utility of linearity in all of those disciplines.

> I think characterizing duality in this way is kind of superfluous,

It's an analysis done out of necessity. These dualities might not be a 100% in every case, but maybe I care about the ways in which they are similar.

> because the only way all those meanings of duality are the same is in the most abstract sense of the word.

So is a monad. Do you think that in the future, the level of abstraction in mathematics is going to increase or decrease?

It will increase, which I guess is sort of my point. We already know there's a lot of abstraction. If these things are only alike semantically (two pairs of dual things can be completely unrelated), what does it gain you to point out they've everywhere?

I don't mean to be obtuse, but it strikes me as saying that a city is full of concrete.

You can think of it as a really nice intermediate language.

It might be hard to build a computer system that lets you reason about both probability and say quantum mechanics.

However it might be easier to build a system that phrases a probabilistic problem in terms of duality and then solves it.

Like why should each of this have it's own foundation when there's one that captures a lot of them?

Those are a bunch of different ideas. It's the word "duality" that is insane(ly overloaded).
They aren't, the underlying idea is the same, or very similar. Pick two of them and I'll provide reference on the connection.
Okay, automatic differentiation and probability.
https://math.stackexchange.com/questions/2593296/why-is-prob...

Automatic differentiation is something that you get for free if you use dual numbers. Lie theory describes the relationship between the discrete and continuous spaces. Probability has this deep connection to Lie groups.

To give you some intuition (and I'm really rephrasing the stackexchange post above), the only way you can only measure randomness (or generate randomness) is if each draw has a "reference" to some global object that has the global, normalized view of the discrete space.

Are you familiar with Vovk's foundations of probability? That brings you from probability to game semantics. Duality is right next to it.

Wait, what?

> Automatic differentiation is something that you get for free if you use dual numbers. Lie theory describes the relationship between the discrete and continuous spaces. Probability has this deep connection to Lie groups.

This...doesn't follow. Probability has a connection to Lie groups because it's fundamentally analytic ("continuous"). But you haven't explained how you make the connection to the dual numbers.

What you're showing here is that a lot of things in mathematics can be described analytically (and saying that would be likewise pretty superfluous). But just because you're working with continuous spaces doesn't mean you've engaged the duals. It generally means you're using the reals.

This gets to the heart of what I'm saying - if I wanted to be flippant I could have said the real numbers, or continuity, or analysis, etc are at the heart of so many distinct subfields of mathematics. It doesn't mean quite a lot.

Duality features in a lot of different parts of mathematics, but that doesn't mean you can productively draw connections between dual things in one area and dual things in another. I'm not seeing how you get from dual numbers to Lie groups.

Discrete optimization and automatic differentiation.
Forward AD is the pushforward of a tangent vector (an element of the tangent space), Reverse AD is a pullback of a cotangent vector (an element of the cotangent space). The duality notion between tangent and cotangent spaces is the same as the duality notion of spaces in optimization. Unfortunately, I'm only passingly familiar with discrete optimization, but I would suspect the notion extends from optimization. That's not to say that they are fundamentally the same or that writing this down helps anybody in any way, but a lot of these "dual" notions do have some sort of dual vector space under the hood.
Yeah, but all you're really describing here is linear algebra. Vector spaces and linearity are a significant part of every single discipline the grandparent commenter mentioned, but they picked out duality.

I would agree with the critique: I don't think highlighting duality here is particularly useful. For example, the way dual numbers are used to extend the reals for automatic differentiation doesn't have a deep connection to duality in vector spaces. It's just a very general semantic concept that describes pairs of things. But it doesn't say that any given pair of dual things is related to another pair of dual things.

Gimme five and I'll answer two. There's quite a few pairwise permutations and some are easier to understand and more instructive than others.

Fundamentally, they are both connected via the idea of convex optimization. Automatic differentiation is a computational technique to solve optimization problems.

Yes optimization problems is very general however calculus is a fundamental tool. Dual numbers are somewhat like lie groups, very smooth and conducive to optimization.

Curious, can you expand on the connection to convex optimization? To my understanding, discrete optimization is nonconvex by nature due to discontinuities in the feasible space.
I think the dualities in quantum field theory are different from the dualities in optimization but maybe some category theorist can correct me if I’m wrong
Category theory isn't quite the correct formalism.
What would be then?
I like linear logic a lot. Not saying it solves everything but it's easier to understand and you don't really give up much of what category theory has.

Also alternating graphs or game semantics.

These are all related.

For those of us who are not aware of duality in auto differentiation (and haven't had a chance to read the above review), could you introduce the idea? Are you talking about forward mode vs reverse mode -- since I haven't pondered, what's interesting/deep about that?
AD relies on dual numbers. Dual numbers are more suited for doing calculus.

Structurally, dual numbers are numbers of the form a + b * e (where e is epsilon s.t. e^2 = 0 but e != 0. Think of it as the imaginary constant but instead of i^2 = -1, you have e^2=0).

For example, multiplication of two dual numbers (a + b * e)(c + d * e) = (ac + ad * e + bc * e +bd * e^2). Since e^2 = 0, you end up with (ac + (ad + bc) * e).

Here comes the magic. Dual numbers let you evaluate a function and get the derivative at that point by just evaluating the function. In the above, example, if this was the result of a function, ac is the value of the function and ad + bc is the derivative of that function at that value.

https://blog.demofox.org/2014/12/30/dual-numbers-automatic-d...

Before looking at the wiki I thought, huh, Grassmann numbers on HN?
You are not wrong.
Edit: I think I answered a different question than I believed.

Been a while but my best crack at it: If you’re trying to minimize a function, you can call it the primal function. It will have n inputs and m constraints (like how many of each product should I buy constrained by budget and carrying capacity). You can flip the problem around into its dual formulation. This will be a function with m inputs and n constraints, and it will be a maximization problem.

If you can solve the dual formulation (global max), you know that the primal cannot possibly be lower than the dual. For some types of problems (convex), you can even guarantee that the global max of the dual is the global min of the primal.

The transformation between primal and dual formulations goes both ways, so the primal is the dual of the dual.

I'd add a very important thing - with your primal problem you are trying to e.g. minimize a function and you proceed the way that all your intermediate solutions are feasible. With dual approach you flip the direction, i.e. maximization in this case, but you start in completely infeasible solutions and hope to end up in the very first feasible solution that should be your optimum.

Now how does that relate to automatic differentiation I am not sure either.