|
|
|
|
|
by jeeceebees
2388 days ago
|
|
Of course there is. All the building blocks that people are mix and matching in networks nowadays were introduced at some point. The paper that introduced batch norm, adaptive instance norm, attention heads, or any module used in a network have an extensive discussion of the motivation for their existance, some derivation or proof that they do what you want, and an empirical test to show it helps in practice. The reason some losses allow GANs to converge in certain situations while others don't isn't a complete mystery, there is theory that supports this. Researchers designing new models are considering weak points in old approaches, identifying why they aren't working correctly, and proposing something new that solves a part of the problem. All of this is done by looking at the math behind all the operations in the network (or at least the parts relevant to a certain question). That nobody really knows how AI works is one of those myths told by the media. Just because the model weights aren't interprable doesn't mean we don't know why that model works well. It just takes quite a bit of maths knowledge to really understand state of the art models. All that knowledge is also easily packaged into modern frameworks that make it easy to use without a deep knowledge of why it works. All of this contributes to the feeling that nobody really knows what's going on, while in reality it's onky the majority of people that don't know what's going on ;) |
|
It's not a myth. No one really understands how neural networks work. We don't know why a particular model works well. Or why any model works well. For example no one can answer why NNs generalize so well even when they have enough learning capacity to memorize all training examples. We can guess, but we don't know for sure. Most of the proofs you see in papers are there as fillers, so that papers seem more convincing. We rarely can prove anything mathematically about NNs that has any practical value or leads to any breakthroughs in understanding.
If we did really understand how NNs work, then we wouldn't need to do expensive hyperparameter searches - we would have a way to determine the optimal ones given a particular architecture and training data. And we wouldn't need to do expensive architecture searches, yet the best of the latest convnets have been found through NAS (e.g. EfficientNet), and there's very little math involved in the process - it's pretty much just random search.
Funny you mentioned the batchnorm paper - we still don't know why batchnorm is so effective - the paper gave an explanation (covariate shift reduction) which later was shown to be wrong (batchnorm does not reduce it), then several other explanations were suggested (smoother loss surface, easier gradient flow, etc), but we still don't know for sure. Pretty much every good idea in NN field is a result of lots of experimentation, good intuition developed in the process, looking at how a brain does it, and practical constraints. And yes, sometimes we're looking at the equations, and thinking hard, and sometimes we see a better way to do stuff. But usually it starts with empirical tests, and if successful, some math is used in the attempt to explain things. Not the other way around.
NNs are currently at a similar point as where physics was before Newton and before calculus.