Hacker News new | ask | show | jobs
by apsec112 889 days ago
It's tiny by the standards of transformers, pretty sure most transformers trained (across all domains) are larger than this
2 comments

No. Where do you have this from?

Looking at NeurIPS 2023:

https://openreview.net/group?id=NeurIPS.cc/2023/Conference#t...

Some random spotlight papers:

- https://openreview.net/pdf?id=YkBDJWerKg: Transformer (VPT) with 248M parameters

- https://openreview.net/pdf?id=CAF4CnUblx: Vit-B/16 with 86M parameters

- https://openreview.net/pdf?id=3PjCt4kmRx: Transformer with 282M parameters

Also, in my field (speech recognition, machine translation, language modeling), all using Transformer variants, this is a pretty normal model size.

It's tiny by the standards of LLMs; _L_LM might give you some indication as to where they fit in the overall landscape.