Hacker News new | ask | show | jobs
by danuker 1090 days ago
I suspect it being 16 bit instead of 32 bit means more of them can get packed more tightly. Some instructions can operate on them in parallel.

But I personally think it's a coincidence, and it just so happens that 50k tokens are enough for the level of complexity the models have right now.