Hacker News new | ask | show | jobs
by bionhoward 771 days ago
How could this help us understand the difference between the learned parameters and their gradients? Can the gradients become one with the parameters a la exponential function?