|
|
|
|
|
by Cacti
3047 days ago
|
|
It is not often the case that someone actually knows why a hyperparam or architecture choice works. We pretend, sometimes, but frankly, it's mostly made up junk to cover the fact that most ML research involves a huge amount of intuitive guesswork and trial-and-error. And the loss surfaces vary. Even just changing the dataset or even the input size alters the loss surface and can easily break a model. It's not called Gradient Descent by Grad Student for nothing. |
|