Hacker News new | ask | show | jobs
by fheinsen 2427 days ago
> How does a regular convnet do on another domain?

As far as I know, regular convnets have failed to outperform query-key-value self-attention models (i.e., transformers based on Vaswani et al.'s work) on pretty much every sequence task, including natural language tasks.

> Learning to do “reverse graphics” is only useful if you can show it is the reason behind performance improvement.

I would strongly disagree. Building systems that can learn "reverse graphics" on their own has long been a goal of computer vision. It seems a prerequisite for building machines that can build internal representations of the state of the physical world around them. Hinton et al.'s 2018 paper has a summary of recent efforts on this front on the "Related Work" section.

> What I’m saying is - no one has yet demonstrated a clear superiority of any capsules based model to the best available plain convnet.

No one is saying otherwise. :-) Convnets are still the right tool for most production systems in visual recognition today.

That said, I don't think a convnet can achieve 99.1% accuracy on smallNORB with only 272K parameters, after training from scratch without using any additional data or metadata of any kind -- like the model using my routing algorithm. If you think you can do that with a convnet, do it and put it up online (I'd love to see it :-)

1 comments

You’re comparing sentence classification done using transformer embeddings to older results which use inferior embeddings. How do regular convnets do when you feed them transformer embeddings?

Re learning reverse graphics - ok, maybe it is indeed the main feature of your work. I’d need to look into that, because from skimming your paper it’s not immediately clear what’s going on there.

Re convnet accuracy on Norb - I’m willing to make that effort for cifar-10 as soon as you have the results.

> You’re comparing sentence classification done using transformer embeddings to older results which use inferior embeddings. How do regular convnets do when you feed them transformer embeddings?

Actually, I'm comparing it to recent models, including XLNet, MT-DNN, Snorkel, and (of course) BERT. AFAIK, convnets have not been able to outperform multihead self-attention, even on pretrained embeddings.

> Re learning reverse graphics - ok, maybe it is indeed the main feature of your work. I’d need to look into that, because from skimming your paper it’s not immediately clear what’s going on there.

I agree, it's not immediately clear. Nonetheless, I find it kind of unbelievable that a model with so few parameters can seem to do it. (I was shocked when I first saw the plots.)

> Re convnet accuracy on Norb - I’m willing to make that effort for cifar-10 as soon as you have the results.

That's a little disappointing... but OK.

Thank you so much for all your questions :-)

Ah, I missed table 4 with the recent models. I looked closer and it does look impressive, however you should ask someone who worked on that task to review your experiments (I haven’t).

Actually, it looks like you got a solid paper. I recommend submitting either to CVPR or ICML, especially if you can get good results on cifar.

Thank you!

Yes, I think this has legs.

Maximizing "bang per bit" (a) seems truly a new idea, as opposed to some minor tweak on the same old thing, and (b) the evidence so far shows it works better than previous methods.

(FWIW, we've been using this algorithm internally at work with similar outperformance over other methods, in yet another domain that is neither vision nor language... but I cannot share those results publicly.)

Before submitting this anywhere, I'd like to get more informal feedback from other AI researchers. I've reached out to people at Google Brain, Facebook AI, DeepMind, OpenAI, and a handful of top academic institutions and research groups. So far, the response has been positive, but I expect it will take everyone at least a couple of weeks, and probably longer, to read and understand the draft paper in sufficient detail to give me more than superficial comments.

New things often look like toys at first. :-)

Keep in mind that someone might still your ideas. Right now there are probably a dozen people preparing capsules related papers for CVPR (due in 2 weeks) so if one of them comes across your paper there’s a temptation.
*steal