|
|
|
|
|
by danuker
1090 days ago
|
|
I suspect it being 16 bit instead of 32 bit means more of them can get packed more tightly. Some instructions can operate on them in parallel. But I personally think it's a coincidence, and it just so happens that 50k tokens are enough for the level of complexity the models have right now. |
|