|
|
|
|
|
by brookst
307 days ago
|
|
Yes. It is not incredibly rare, it's incredibly common. A huge percentage of queries to retail LLMs are things like "hello" and "what can you do", with static system prompts that make the total context identical. It's worth maybe a 3% reduction in GPU usage. So call it a half billion dollars a year or so, for a medium to large service. |
|