|
|
|
|
|
by pqtyw
16 days ago
|
|
How is tok/s not a bottleneck I? I assume most people still use ai agents interactively rather than leaving them to do their own thing during the night. I find anything below 50 tps or so entirely unusable... Regardless its Apples to oranges anyway, inference is quite cheap for open weight models its just that Claude and OpenAI can charge very high margins compared to e.g. DeepSeek or various provider on OpenRouter since open models are a commodity. |
|