| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by albertzeyer 889 days ago
	It's not tiny, this is a quite normal size outside the field of LLMs, e.g. normal-sized language models, or also translation models, or acoustic models. Some people even would call this large.

1 comments

apsec112 889 days ago

It's tiny by the standards of transformers, pretty sure most transformers trained (across all domains) are larger than this

albertzeyer 889 days ago

No. Where do you have this from?

Looking at NeurIPS 2023:

https://openreview.net/group?id=NeurIPS.cc/2023/Conference#t...

Some random spotlight papers:

- https://openreview.net/pdf?id=YkBDJWerKg: Transformer (VPT) with 248M parameters

- https://openreview.net/pdf?id=CAF4CnUblx: Vit-B/16 with 86M parameters

- https://openreview.net/pdf?id=3PjCt4kmRx: Transformer with 282M parameters

Also, in my field (speech recognition, machine translation, language modeling), all using Transformer variants, this is a pretty normal model size.

kristjansson 889 days ago

It's tiny by the standards of LLMs; _L_LM might give you some indication as to where they fit in the overall landscape.