| > You’re comparing sentence classification done using transformer embeddings to older results which use inferior embeddings. How do regular convnets do when you feed them transformer embeddings? Actually, I'm comparing it to recent models, including XLNet, MT-DNN, Snorkel, and (of course) BERT. AFAIK, convnets have not been able to outperform multihead self-attention, even on pretrained embeddings. > Re learning reverse graphics - ok, maybe it is indeed the main feature of your work. I’d need to look into that, because from skimming your paper it’s not immediately clear what’s going on there. I agree, it's not immediately clear. Nonetheless, I find it kind of unbelievable that a model with so few parameters can seem to do it. (I was shocked when I first saw the plots.) > Re convnet accuracy on Norb - I’m willing to make that effort for cifar-10 as soon as you have the results. That's a little disappointing... but OK. Thank you so much for all your questions :-) |
Actually, it looks like you got a solid paper. I recommend submitting either to CVPR or ICML, especially if you can get good results on cifar.