Thanks, but could you briefly expand on what's happening in the minimal if (x >= 0) case with the discograd modification? What source code modification could the user could have made to achieve the same effect?
In DiscoGrad, smoothing would be applied by adding Gaussian noise with some configurable variance to x and running the program on those x's. The gradient would then be calculated based on the branch condition's derivative wrt. x (which is 1) and an estimate of the distribution of the condition (which is Gaussian).
In this specific example, the smoothed derivative happens to be exactly the Gaussian cumulative distribution function, so the user could just replace the program with that function. However, for more complex programs, it'd be hard to find such correspondences manually.
In this specific example, the smoothed derivative happens to be exactly the Gaussian cumulative distribution function, so the user could just replace the program with that function. However, for more complex programs, it'd be hard to find such correspondences manually.