|
|
|
|
|
by eref
3140 days ago
|
|
The same problem occurs with avg pooling. Strided conv also allows to "pool" neurons in the layer below to reduce the number of neurons in subsequent layers, but, in practice, deeper neurons then also have trouble learning precise representations of the locations of the things below (but much more info is retained compared to avg/max pooling). Capsules can presumably learn such things much more accurately because they can, in principle, learn precise geometric mappings to infer positions independently of the viewpoint. However, the results so far are not much better than scalar output neurons. Capsules do perform a bit better in terms of robustness against adversarial examples and overlapping objects. |
|
There are some details I haven't thought throw on this, but I'd imagine you'd want your stride length to be around the standard deviation of the Gaussian.
Any pointers to papers on this (or comments on why this obviously won't work) would be very welcome - I'm still trying to develop my intuition on all this!