|
$1500/mo is $18,000/seat/annum. Maybe Microsoft and Nvidia are on to something. 128 GB machines that can run local LLMs are a bargain even if priced $5-8k. Yes, tok/s is not quite there, but that's probably OK since the bottleneck really isn't the code; it's WTF did Uber build with all of that spend? How did it meaningfully impact their revenue in a positive direction? |
I find anything below 50 tps or so entirely unusable...
Regardless its Apples to oranges anyway, inference is quite cheap for open weight models its just that Claude and OpenAI can charge very high margins compared to e.g. DeepSeek or various provider on OpenRouter since open models are a commodity.