Hacker News new | ask | show | jobs
by albertzeyer 1964 days ago
Hey, very interesting work!

CPython is build in C. Can you differentiate through that? I.e. then Python programs also become differentiable? Similar as JAX.

How much control do you have about the gradient? In some cases, it can be useful to explicitly define a custom gradient, or to stop the gradient, or to change the gradient, etc.

Can you define gradients on integral types (int, char)?

1 comments

Regarding differentiating python via CPython, theoretically yes, though practically it is likely more wise to use something like Numba which takes Python to LLVM directly to avoid a bunch of abstraction overhead that would otherwise have to be differentiated through. Also fun fact JaX can be told to simply emit LLVM and we've used that as an input for tests :)

You can explicitly define custom gradients by attaching metadata to the function you want to have the custom gradient (and Enzyme will use that even if it could differentiate the original function).

Integral types: mayyybe, depending what exactly you mean. I can imagine using custom gradient definitions to try specifying how an integral type can be used in a differentiable way (say representing a fixed point). We don't support differentiating integral types by approximating them as continuous values if that's what you're asking. There's no reason why we couldn't add this (besides perhaps bit tricks being annoying to differentiate), but haven't come across a use case.

Yea, I had this very rough (maybe crazy) idea in mind:

Once you can differentiate through CPython, and let's say you can also differentiate integral types via some approximation, and you have some bug in some Python code, and a failing test case in Python, you can use the output (e.g. exception of the failing test) as an error signal and backpropagate to the Python program code. The Python program code is represented as a chunk of bytes. If there is some meaningful gradient, it could point you to possible source code locations where the bug might be.

Probably the gradient will be quite meaningless though, and that's why the idea does not really work in practice. But I think for some simple examples, it still might work.

For any possible branches in the code (and there are a lot), to get a good approximated gradient, you should visit some of the branches, maybe some MC sampling or so.