|
|
|
|
|
by frankling_
746 days ago
|
|
We're doing something less expensive: essentially, the overall gradient is computed based on certain statistics based on the branch condition and its derivatives when a branch is encountered. We mention neural networks because DiscoGrad lets you combine branching programs with neural networks (via Torch) and jointly train/optimize them. |
|