Hacker News new | ask | show | jobs
by tagrun 1514 days ago
In the context of neural networks with differential equations (which appears to be the original poster's field), the trade-off depends: https://diffeqflux.sciml.ai/dev/ControllingAdjoints/
1 comments

yeah, my systems are really small in comparison (1-20) but with higher order derivatives (up to 4th order), so reverse AD is not the best in that regard