Hacker News new | ask | show | jobs
by Cacti 3047 days ago
It is not often the case that someone actually knows why a hyperparam or architecture choice works. We pretend, sometimes, but frankly, it's mostly made up junk to cover the fact that most ML research involves a huge amount of intuitive guesswork and trial-and-error.

And the loss surfaces vary. Even just changing the dataset or even the input size alters the loss surface and can easily break a model.

It's not called Gradient Descent by Grad Student for nothing.