Good suggestion, it was tough to narrow down the list! Here is a link to the ViT paper in case others are interested [1].
According to the latest ImageNet standings [2], ViT appears to have slipped to second place in Top-1 Accuracy. CoAtNet-7 is the new leader, but only by a slight margin and at the cost of what appears to be a significantly larger model.