|
|
|
|
|
by hansvm
233 days ago
|
|
They exist, but they're not that common (give or take the "expected" numerical deviations based on the order of summation and whatnot, which can both be nontrivial and propagate error further). Something I recommend doing, the best time being the start of the project and the second best time being now, is adding numerical gradient checking tests to all operations. You will make mistakes in your kernels from time to time, and it's valuable to know at a glance where those mistakes are. Mind you, it's possible to write both the forward pass and the backward pass in a way that's wrong but compatible. An additional layer of checks I like to add is a dead-simple implementation of all algorithms -- no vectorization, no fancy blocking or re-orderings, nothing. Compare results to the simple implementation. It sounds like a lot of work, but writing an optimized kernel is much slower than the numerical gradient checking and the simple kernel, and given how in numerical code it's basically impossible to identify the source of a bug without doing the equivalent of all of those checks, it only takes one bug in the whole project for the effort to pay off. |
|
I'll try to replace bits by simplified versions though, probably could help at least getting closer to knowing where the issue is.
Anyone have more debugging tips I'd greatly appreciate it! Nothing is too small or "obvious", as I'm about to lose my mind more or less.