Hacker News new | ask | show | jobs
by tourist_on_road 1569 days ago
Good point. The fact there is no inductive bias inherent to transformers makes it difficult to train a decent model on small datasets from scratch. However, there are recent research directions that try to address this problem [1].

Also baking in some sort of domain specific inductive bias into model architecture itself can address this problem as well [2].

[1]: Escaping the Big Data Paradigm with Compact Transformers: https://arxiv.org/abs/2104.05704

[2]: CvT: Introducing Convolutions to Vision Transformers: https://arxiv.org/abs/2103.15808

1 comments

Maybe a naive question: is there no transfer learning with transformers? I've done a lot of work with CNN architectures on small datasets, and almost always start with something trained on imagenet, and fine tune, or do some kine of semi-supervised training to start. Can we do that with VIT et al as well? Or are they really usually trained from scratch?
Lots of people transfer learn with transformers. ViT[0] originally did CIFAR with it. Then DeiT[1] introduced some knowledge transfer (note: their student is larger than the teacher). ViT pretrained on both ImageNet21k and JFT-300m.

CCT ([1] from above) was focused on training from scratch.

There's two paradigms to be aware of. ImageNet and pre-training can often be beneficial but it doesn't always help. It really depends on the problem you're trying to tackle and if there are similar features within the target dataset and the pre-trained dataset. If there is low similarity you might as well train from scratch. Also, you might not want as large of models (like ViT and DeiT have, which ViT's has more parameters than CIFAR-10 has features).

Disclosure: Author on CCT

[0] https://arxiv.org/abs/2010.11929

[1] https://arxiv.org/abs/2012.12877

Awesome, thanks for the reply. It's been on my list to try transformers instead of (mainly) Resnet for a while now.
Sure thing. Also if you're getting into transformers I'd recommend lucidrains's GitHub[0] since it has a large collection of them with links to papers. It's nice that things are consolidated.

[0] https://github.com/lucidrains/vit-pytorch