|
|
|
|
|
by tourist_on_road
1569 days ago
|
|
Good point. The fact there is no inductive bias inherent to transformers makes it difficult to train a decent model on small datasets from scratch. However, there are recent research directions that try to address this problem [1]. Also baking in some sort of domain specific inductive bias into model architecture itself can address this problem as well [2]. [1]: Escaping the Big Data Paradigm with
Compact Transformers: https://arxiv.org/abs/2104.05704 [2]: CvT: Introducing Convolutions to Vision Transformers: https://arxiv.org/abs/2103.15808 |
|