|
|
|
|
|
by rocauc
808 days ago
|
|
Pulling out a key part of this post from a DeepMind 2023 paper[1]:
“Although the success of ViTs in computer vision is extremely impressive, in our view there is no strong evidence to suggest that pre-trained ViTs outperform pre-trained ConvNets when evaluated fairly.” Another common constraint in vision vs language is the long tails are very long in the visual world. There's a number of domains where you have very little examples to learn (defects are designed to happen infrequently; rare species for identification show up, well, rarely). And pulling from the blog: "But small models ... benefit greatly from the exact type experiment of outlined in this post: strong augmentation with limited data trained across many epochs." [1] https://arxiv.org/pdf/2310.16764.pdf |
|