| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by albertzeyer 889 days ago

No. Where do you have this from?

Looking at NeurIPS 2023:

Some random spotlight papers:

- https://openreview.net/pdf?id=YkBDJWerKg: Transformer (VPT) with 248M parameters

Also, in my field (speech recognition, machine translation, language modeling), all using Transformer variants, this is a pretty normal model size.