Hacker News new | ask | show | jobs
by barfbagginus 811 days ago
See my comment to Mike. I think you're making a valid point here, which is that dual numbers by themselves are not powerful enough to automatically generate derivatives of arbitrary functions, especially given that those functions could be implemented in a FPU core, or using methods like lookup tables that don't lend themselves to dual number differentiation.

Dual numbers help us automatically differentiate things when the functions themselves are implemented as analytic power series that we have to explicitly compute without accelerator help. In such cases we can indeed use them. But to your point, serious forward AD engines need to differentiate functions that are computed in one shot by accelerator hardware.

However Mike makes a very valid counterpoint when he shows forward mode AD in Torch. I believe a careful analysis of Torch's implementation here could bring this conversation to a productive and satisfying conclusion for all participants and our public audience.

My big question here is to what degree did the implementers try to respect the dual number approach? Did they implement a dual tensor class for instance? Do they automatically lift some ordinary computations into dual tensor computations? I honestly have my doubts there.

I have confidence that we can get to the bottom of this. I think that Mike actually does care about automatic differentiation, and would be receptive to discussing this point of subtlety that naive dual number implementations may not be enough for industrial strength AD systems, with clear examples of code and clear reasoning as to how dual numbers fail in important cases.