Looking at NeurIPS 2023:
https://openreview.net/group?id=NeurIPS.cc/2023/Conference#t...
Some random spotlight papers:
- https://openreview.net/pdf?id=YkBDJWerKg: Transformer (VPT) with 248M parameters
- https://openreview.net/pdf?id=CAF4CnUblx: Vit-B/16 with 86M parameters
- https://openreview.net/pdf?id=3PjCt4kmRx: Transformer with 282M parameters
Also, in my field (speech recognition, machine translation, language modeling), all using Transformer variants, this is a pretty normal model size.