|
|
|
|
|
by fpgamlirfanboy
811 days ago
|
|
> and if addition and multiplication work the way you expect and propagate derivatives correctly - then you are using dual numbers you literally started out your miraculous comment with > This new algebra is called the ring of "dual numbers." The difference is that instead of adding a new element "i" with i² = -1, we add one called "h" with h² = 0! not some observation about caching derivatives. so i'll repeat myself for the 3rd time: there are no magical numbers anywhere in pytorch or tensorflow or cafe or any other serious autodiff implementation that abide by the rules you so jubilantly exclaim about. |
|
Dual numbers help us automatically differentiate things when the functions themselves are implemented as analytic power series that we have to explicitly compute without accelerator help. In such cases we can indeed use them. But to your point, serious forward AD engines need to differentiate functions that are computed in one shot by accelerator hardware.
However Mike makes a very valid counterpoint when he shows forward mode AD in Torch. I believe a careful analysis of Torch's implementation here could bring this conversation to a productive and satisfying conclusion for all participants and our public audience.
My big question here is to what degree did the implementers try to respect the dual number approach? Did they implement a dual tensor class for instance? Do they automatically lift some ordinary computations into dual tensor computations? I honestly have my doubts there.
I have confidence that we can get to the bottom of this. I think that Mike actually does care about automatic differentiation, and would be receptive to discussing this point of subtlety that naive dual number implementations may not be enough for industrial strength AD systems, with clear examples of code and clear reasoning as to how dual numbers fail in important cases.