Hacker News new | ask | show | jobs
by casualscience 753 days ago
I'm confused as to the use cases for this? Are you saying if I want to fit some "magic numbers" in my cpp program, I can now do that by pulling in discograd and wrapping those numbers with some code that says "please fit these", then adding some test cases somewhere?
2 comments

(I am one of the authors) Thanks for your question. Yes, similar to what you describe but not quite. The prime use case is to apply DiscoGrad together with a gradient descent optimizer to optimization problems. For a C++ program to be regarded as such, you have to define what the "inputs" are and the program has to return some numerical value (loss) that is to be maximized/minimized. The tool then delivers a "direction" (smoothed gradient), which gradient descent can use to adjust the inputs toward a local optimum.

So if you can express your test cases in a numerical way and make the placeholders for the "magic numbers" visible to the tool by regarding them as "inputs" (which should generally be possible), this may be a possible use-case. Hope this clarifies it.

No, the use cases for this are similar to regular autodiff, where you implement a function f(x) and the library helps you automatically compute derivatives such as the gradient g(x) := ∇f(x). Various autodiff methods differ in how they accomplish this, and the library shared here uses a code-generation approach where it performs a source-to-source transformation to generate source code for g(x) based on the code for f(x).
You are right in that the use-cases are very similar to regular autodiff, with the added benefit that the returned gradient also accounts for the effects of taking alternative branches.

Just to clarify: we do a kind of source-to-source transformation by transparently injecting some API-calls in the right places (e.g., before branching-statements) before compilation. However, the compiled program then returns the program output alongside the gradient.

For the continuous parts, the AD library that comes with DiscoGrad uses operator overloading.

> with the added benefit that the returned gradient also accounts for the effects of taking alternative branches.

Does this mean that you can take the partial derivative in respect to some boolean variable that will be used in an if (for example), but with regular autodiff you can't?

I'm struggling to understand why regular autodiff works even in presence of this limitation. Is it just a crude approximation of the "true" derivative?