Hacker News new | ask | show | jobs
by SleekEagle 1515 days ago
Ultimately it comes down to gradient-based descent (which is pretty magical in its own right), but what's most surprising to me is that the loss landscape is actually organized enough to yield impressive results. Obviously the difficulties of training large NNs are well-documented, but I'm surprised it's even that easy