|
|
|
|
|
by runT1ME
2024 days ago
|
|
>suggests that the architecture doesn't matter, but we know that's not true. Eg, why are networks with skip connections SO much better than networks without? What about batch normalization? Is this true though, or does network architecture only matter in terms of efficiency? This is non rhetorical, I really don't know much about deep learning. :) I guess i'm asking if with enough data and compute, is architecture still relevant? |
|
https://papers.nips.cc/paper/2018/file/a41b3bb3e6b050b6c9067...
In effect, there's a big gap between an existence proof and actually workable models, and the tricks of the trade do quite a lot to close the gap. (And there are almost certainly more tricks that we're still not aware of! I'm still amazed at how late in the game batch normalization was discovered.)
OTOH, so long as you're using the basic tricks of the trade, IME architecture doesn't really matter much. Our recent kaggle competition for birdsong identification was a great example of this: pretty much everyone reported that the difference between five or so 'best practices' feature extraction architectures (various permutations of resnet/efficientnet) was negligible.