| HN Mirror

I agree with that intuition. In our experience, it's easiest to see gains over other optimization techniques when the program is "branch-wise smooth and non-constant". Then, we get the full benefits of exact autodiff gradients "per branch", and our smoothing approach handles the branches. For SAT solving and other purely combinatorial problems, sufficiently accurate smoothing may indeed be too expensive. Also, in such problems, the average local minimum found via gradient descent may not always be that great. That said, we're still exploring where the limits really are.