Hacker News new | ask | show | jobs
by dwiel 4411 days ago
Interesting to note though that even with a linear network that can be represented by a single matrix, it can be faster, easier and converge to better results with multiple layers because the different gradient and parameter space that is presented to the optimization algorithm.