Hacker News new | ask | show | jobs
by albertzeyer 889 days ago
No. Where do you have this from?

Looking at NeurIPS 2023:

https://openreview.net/group?id=NeurIPS.cc/2023/Conference#t...

Some random spotlight papers:

- https://openreview.net/pdf?id=YkBDJWerKg: Transformer (VPT) with 248M parameters

- https://openreview.net/pdf?id=CAF4CnUblx: Vit-B/16 with 86M parameters

- https://openreview.net/pdf?id=3PjCt4kmRx: Transformer with 282M parameters

Also, in my field (speech recognition, machine translation, language modeling), all using Transformer variants, this is a pretty normal model size.