Wouldn't replacing the flow control statements with ML models slow it down too much? Do you have the ability to automatically estimate the appropriate model complexity for a given statement based on how hot it is?
We're doing something less expensive: essentially, the overall gradient is computed based on certain statistics based on the branch condition and its derivatives when a branch is encountered.
We mention neural networks because DiscoGrad lets you combine branching programs with neural networks (via Torch) and jointly train/optimize them.
We mention neural networks because DiscoGrad lets you combine branching programs with neural networks (via Torch) and jointly train/optimize them.