|
|
|
|
|
by martinald
291 days ago
|
|
Thanks for the correction (author here). I'll update the article - very fair point on compute on input tokens which I messed up. Tbh I'm pleased my napkin math was only 7x off the laws of physics :). Even rerunning the math on my use cases with way higher input token cost doesn't change much though. |
|
The component about requiring long context lengths to be compute-bound for attention is also quite misleading.