Hacker News new | ask | show | jobs
by albertzeyer 889 days ago
It's not tiny, this is a quite normal size outside the field of LLMs, e.g. normal-sized language models, or also translation models, or acoustic models. Some people even would call this large.
1 comments

It's tiny by the standards of transformers, pretty sure most transformers trained (across all domains) are larger than this
No. Where do you have this from?

Looking at NeurIPS 2023:

https://openreview.net/group?id=NeurIPS.cc/2023/Conference#t...

Some random spotlight papers:

- https://openreview.net/pdf?id=YkBDJWerKg: Transformer (VPT) with 248M parameters

- https://openreview.net/pdf?id=CAF4CnUblx: Vit-B/16 with 86M parameters

- https://openreview.net/pdf?id=3PjCt4kmRx: Transformer with 282M parameters

Also, in my field (speech recognition, machine translation, language modeling), all using Transformer variants, this is a pretty normal model size.

It's tiny by the standards of LLMs; _L_LM might give you some indication as to where they fit in the overall landscape.