|
|
|
|
|
by jmvalin
3181 days ago
|
|
The main problem here is that you're depending on implementation-specific behaviour. If you train on a device, you have to run on a device with exactly the same behaviour. On top of that, some FPUs have very slow (trapping) denormal handling. I'm also unsure how accurate the gradient computation can be when the signal itself has numerical issues. I don't deny it's a cool hack, but beyond that I don't think I see the point or the problem this is trying to solve. |
|