Hacker News new | ask | show | jobs
by _ink_ 125 days ago
Interesting. Is it because they can or is it really more expensive for them to process bigger context?
2 comments

Attention is, at its core, quadratic wrt context length. So I'd believe that to be the case, yeah.
I've read that compute costs for LLMs go up O(n^2) with context window size. But I think it is also a combination of limited compute availability, users preference for Anthropic models and Anthropic planning to go IPO.