Hacker News new | ask | show | jobs
by dmurfet 3064 days ago
Author here. The theoretical background can be found in:

https://arxiv.org/abs/1407.2650

https://arxiv.org/abs/1701.01285

http://therisingsea.org/notes/MScThesisJamesClift.pdf

As neel_k notes, a good way to understand this picture is in terms of differential linear logic (a refinement of simply-typed differential lambda calculus). I did not provide references in the talk as unfortunately I did not understand the subject at the time, but the introduction to the second paper above hopefully serves to remedy that omission. Suffice to day that Ehrhard and Regnier discovered something quite profound in differential lambda calculus, the implications of which are still being explored; this discovery should have significant bearing on what people refer to as differentiable programming, I think.

The third link above is the master’s thesis of my talented student James Clift. We have subsequently extended that work and are finishing a paper which explains the computational content of derivatives of Turing machines, according to differential linear logic. This might be of interest to people here. I will post a link when we put it online.

5 comments

I find this stuff fascinating, at least in the application of differential calculus to things I thought non-continuous, differential grammars, differential regex, etc.

It seems like operating in the discrete domain was fine for small problems where problems could be brute forced, but if we want to get into an analytical regime for larger problems, differentiation is akin to recursion/induction in that it allows us to make tractable smaller problems. Is that roughly correct?

How would you recommend an undergrad bootstrap themselves on this subject? Are there patterns we can apply to transform discrete problems into differentiable problems?

I think it’s too early to say what differential lambda calculus or linear logic is “good for” in a practical sense. However, I don’t tend to think of it as being about breaking problems into smaller pieces.

I think it is more about error propagation: you can’t estimate which inputs to an algorithm contribute more to the final error, without some notion of derivative of an algorithm with respect to its inputs (even if those inputs are discrete, so that making infinitesimal variations does not obviously make sense).

Linear Logic is amazing-ly useful, http://homepages.inf.ed.ac.uk/wadler/topics/linear-logic.htm...

Rust [0] wouldn't exist without it.

Oddly enough, I think the next discontinuity in programming language design will be in making it easier to construct anytime [1], incremental [2] and succinct [3] programs. There is a tyranny of poor tolerancing that we currently have a very hard time from escaping.

Is it that differentiable programs can be made sloppy in a way that a non-differentiable one can't? Have you looked at the research of using DNN for physics simulations? [4]

[0] https://www.rust-lang.org/en-US/

[1] https://en.wikipedia.org/wiki/Anytime_algorithm

[2] https://en.wikipedia.org/wiki/Incremental_computing

[3] https://en.wikipedia.org/wiki/Succinct_data_structure

[4] https://www.youtube.com/watch?v=iOWamCtnwTc

What do you think of 'Programming with a Differentiable Forth Interpreter'[0] and 'A Neural Forth Abstract Machine'[1]? I don't know if they are related to your work, but both are on my to-read pile.

[0]: https://arxiv.org/abs/1605.06640 [1]: http://people.idsia.ch/~rupesh/rnnsymposium2016/files/bosnja...

I think these papers are fascinating! I would also recommend the Neural Turing Machine paper from DeepMind, which I know better than the papers you mention.

But keep in mind that this is a very young field, and there is no definitive point of view.

Fantastic stuff!

Does your work still keep the non-determinism of Ehrhard and Regnier?

This was the part that 'bothered' me, but it provided evidence of a link with process calculi.

I've also thought Ehrhard and Regnier's work was groundbreaking. It potentially opens up not just insights into differentiable programming, but a theory of concurrent computation and an algebraic theory of computation. My expectation is that it will reveal some deep links with other areas of mathematics, like group theory.

I think you are right, the meaning of the sums that appear in syntactic derivatives is quite subtle, and I don’t claim to have an authoritative answer as to their deep meaning.

Semantically, however, I think things are clearer. The denotation of the syntactic derivative of a proof is a limit, and the terms in the limit have an interpretation as probabilistic processes in a (close to) standard sense. So while the sum-over-linear-substitutions doesn’t have a clear probabilistic interpretation (at least that I can see) it is the limit of computational processes where you run the probability of making a substitution to zero.

I don’t know anything about process calculi, unfortunately!

P.S - Silly question, I can see it does. I responded before reading.
Sorry for the unrelated comment but I wanted to thank you for your nice lecture notes. I found them clear and very helpful during my studies.
Thanks! You’re welcome.
Oh, wow -- thanks especially for that second link! I was wondering about this recently.