Hacker News new | ask | show | jobs
by fxtentacle 2243 days ago
My personal experience with capsule networks is that they didn't work better than a similar number of ungrouped neurons in any case.

If capsules work wonders for you, my first guess would be that you can improve your training of the standard network to make it work equally well.

In general, my hunch is that capsules are still too low level and too much of a local change to make a strong difference.

To give an example, all of the state of the art optical flow AIs are based on building cost volumes and then resolving them. There are edge cases, where one can prove mathematically that reducing the cost volume to a flow direction will make it impossible to produce the correct result. So to make a significant contribution, it doesn't help to use capsules in the feature processing stage, but you need to replace the entire architecture.

1 comments

Thank you. You may be right. To some extent we're all guessing based on our own hunches :-)

FWIW, I've had the most success with EM/Heinsen-type routing algorithms -- that is, those in which each output capsule is generated by a probabilistic model (such as a Gaussian mixture), and the output capsule activates only to the extent its model can explain (i.e., generate) its view of input data better (in some quantifiable manner) than other output capsules. The notion that an output capsule "must explain input data better than other capsules in order to activate" is very appealing to me as a mechanism for inducing per-layer "explainability" in models.

In my experience so far, routing tends to work better on top of conventional architectures, e.g., use a ResNet for feature detection and stack two or more routing layers on top for classifying into hidden factors and then into training labels. Also, to get models to converge, I have found it helps to apply a nonlinear transformation to the features and then at least two routing layers on top. (I don't have a good explanation as to why two or more tend to work better than only one.) Finally, I usually feed only the capsule activations to the loss function -- that is, during training I let the capsules themselves "do whatever they want" to learn to explain input data.