|
|
|
|
|
by jkam
3133 days ago
|
|
I think XLA is trying to reduce the overhead introduced by backprop, meaning when you optimize the computational graph you might end up with an efficient calculation of the gradient (closer to the calculation you get with MC).
Regarding non-scalar valued functions: Don't you reduce a constrained problem to a series of unconstrained problems (via a penalty (or even augmented Lagrangian) or barrier method)? Then you only need the gradient of a scalar valued function to solve constrained problems. I imagine you can use the gradients of the constraints for solving the KKT conditions directly but this seems only useful if the system is not too big. But for sure it opens new possibilities. |
|