|
|
|
|
|
by tbalsam
403 days ago
|
|
As someone who's done a fair bit of architecture work -- both are important! Making it either or is a very silly thing, both are the limiting factor for the other and there are no two ways about it. Also, for classification, MaxPooling is often far superior, you can learn an average smoothing filter in your convolutions beforehand in a data-dependent manner so that Nyquist sampling stuff is properly preserved. Also, please do smoothed crossentropy for image class stuff (generally speaking, unless maybe data is hilariously large), MSE won't nearly cut it! But that being said, adaptive stuff certainly is great when doing classification. Something to note is that batching does become an issue at a certain point -- as well as certain other fine-grained details if you're simply going to average it all down to one single vector (IIUC). |
|
Of course. The MSE here is not intended to be a training loss, but as a means to demonstrate that both approaches lead to almost the same result except for some rounding error. The MSE is somewhere in the order of 10^-9.
> Also, for classification, MaxPooling is often far superior, you can learn an average smoothing filter in your convolutions beforehand in a data-dependent manner so that Nyquist sampling stuff is properly preserved.
I don't think that max pooling the last feature maps would be a good idea here, because it would cut off about 98 % of the gradients and training would take much longer. (The shape of the input feature layer is (1, 768, 7, 7), pooled to (1, 768, 1, 1).)
> Something to note is that batching does become an issue at a certain point
Could you elaborate on that?