|
|
|
|
|
by p1esk
2428 days ago
|
|
How does a regular convnet do on another domain? Learning to do “reverse graphics” is only useful if you can show it is the reason behind performance improvement, compared to a plain convnet. Until we have cifar-10 results it’s not clear. What I’m saying is - no one has yet demonstrated a clear superiority of any capsules based model to the best available plain convnet. Even on cifar-10. Looking forward to your results! |
|
As far as I know, regular convnets have failed to outperform query-key-value self-attention models (i.e., transformers based on Vaswani et al.'s work) on pretty much every sequence task, including natural language tasks.
> Learning to do “reverse graphics” is only useful if you can show it is the reason behind performance improvement.
I would strongly disagree. Building systems that can learn "reverse graphics" on their own has long been a goal of computer vision. It seems a prerequisite for building machines that can build internal representations of the state of the physical world around them. Hinton et al.'s 2018 paper has a summary of recent efforts on this front on the "Related Work" section.
> What I’m saying is - no one has yet demonstrated a clear superiority of any capsules based model to the best available plain convnet.
No one is saying otherwise. :-) Convnets are still the right tool for most production systems in visual recognition today.
That said, I don't think a convnet can achieve 99.1% accuracy on smallNORB with only 272K parameters, after training from scratch without using any additional data or metadata of any kind -- like the model using my routing algorithm. If you think you can do that with a convnet, do it and put it up online (I'd love to see it :-)