| HN Mirror

If you're implementing your own derivatives, you should probably use numerical methods to check that the gradient is computed correctly, and pytorch comes with tools to help with that https://pytorch.org/docs/stable/autograd.html#numerical-grad...

If you're reimplementing something for learning purposes, you can just compare the output you get with the existing implementation.

But as soon as you're trying to do something novel, no automatic test can tell you whether the model architecture or your implementation of it is at fault when you don't get the results you'd like, because there's nothing you could use for comparison.