Hacker News new | ask | show | jobs
by p1esk 2427 days ago
You’re comparing sentence classification done using transformer embeddings to older results which use inferior embeddings. How do regular convnets do when you feed them transformer embeddings?

Re learning reverse graphics - ok, maybe it is indeed the main feature of your work. I’d need to look into that, because from skimming your paper it’s not immediately clear what’s going on there.

Re convnet accuracy on Norb - I’m willing to make that effort for cifar-10 as soon as you have the results.

1 comments

> You’re comparing sentence classification done using transformer embeddings to older results which use inferior embeddings. How do regular convnets do when you feed them transformer embeddings?

Actually, I'm comparing it to recent models, including XLNet, MT-DNN, Snorkel, and (of course) BERT. AFAIK, convnets have not been able to outperform multihead self-attention, even on pretrained embeddings.

> Re learning reverse graphics - ok, maybe it is indeed the main feature of your work. I’d need to look into that, because from skimming your paper it’s not immediately clear what’s going on there.

I agree, it's not immediately clear. Nonetheless, I find it kind of unbelievable that a model with so few parameters can seem to do it. (I was shocked when I first saw the plots.)

> Re convnet accuracy on Norb - I’m willing to make that effort for cifar-10 as soon as you have the results.

That's a little disappointing... but OK.

Thank you so much for all your questions :-)

Ah, I missed table 4 with the recent models. I looked closer and it does look impressive, however you should ask someone who worked on that task to review your experiments (I haven’t).

Actually, it looks like you got a solid paper. I recommend submitting either to CVPR or ICML, especially if you can get good results on cifar.

Thank you!

Yes, I think this has legs.

Maximizing "bang per bit" (a) seems truly a new idea, as opposed to some minor tweak on the same old thing, and (b) the evidence so far shows it works better than previous methods.

(FWIW, we've been using this algorithm internally at work with similar outperformance over other methods, in yet another domain that is neither vision nor language... but I cannot share those results publicly.)

Before submitting this anywhere, I'd like to get more informal feedback from other AI researchers. I've reached out to people at Google Brain, Facebook AI, DeepMind, OpenAI, and a handful of top academic institutions and research groups. So far, the response has been positive, but I expect it will take everyone at least a couple of weeks, and probably longer, to read and understand the draft paper in sufficient detail to give me more than superficial comments.

New things often look like toys at first. :-)

Keep in mind that someone might still your ideas. Right now there are probably a dozen people preparing capsules related papers for CVPR (due in 2 weeks) so if one of them comes across your paper there’s a temptation.
*steal
Thank you for saying that. Sometimes I forget how petty and small people can be, especially when they are under pressure, academic and otherwise.

I'll take a look at submitting it to CVPR.

In the meantime, please circulate my work. It's on record, online. The more people who are aware that others have seen it, the less likely someone will try to plagiarize it.

I'm not under any kind of academic pressure, so I don't need citations, conference slots, etc. But I do deserve credit for this, don't you think?

PS. And now that you mention it, a couple of people to whom I reached out mentioned they were under deadline over the next two weeks.

PPS. Send me an email!