Today. But what about in 5 years? Would you bet we will be paying hundreds of billions to OpenAI yearly or buying consumer GPUs? I know what I will be doing.
But the progress goes both ways: In five years, you would still want to use whatever is running on the cloud supercenters. Just like today you could run gpt-2 locally as a coding agent, but we want the 100x-as-powerful shiny thing.
That would be great if that was the case but my understanding is that the progress is plateauing. I don't know how much of this is anthorpic / Google / openAI holding itself back to save money and how much is the state of the art improvement slowing down though. I can imagine there could be a 64 GB GPU in five years as absurd as it feels to type that today.
> I'm finding the difference just between Sonnet 4 and Sonnet 4.5 to be meaningful in terms of the complexity of tasks I'm willing to use them for.
That doesn't mean "not plateauing".
It's better, certainly, but the difference between SOTA now and SOTA 6 months ago is a fraction of the difference between SOTA 6 months ago and the difference 18 months ago.
It doesn't mean that the models aren't getting better, it means that the improvement in each generation is smaller than the the improvement in the previous generation.
18 months ago to 6 months ago was indeed a busy period - both multimodal image input and reasoning models were rare at the start of that time period and common by the end of it.
Comparing a 12 month period to a 6 month period feels unfair to me though. I think we will have a much fuller picture by the end of the year - I have high expectations for the next wave of Chinese models and for Gemini 3.
> Comparing a 12 month period to a 6 month period feels unfair to me though.
Okay. Let me clarify then.
The difference between SOTA now and SOTA 6 months ago is a fraction of the difference between SOTA 6 months ago and SOTA 12 months ago.
That still "plateauing". The performance of the models, should you take the time to chart them, is clearly asymptotic and we're in the flattening out phase now.
I also observe that all the models are converging on roughly the same performance, which makes me think that we are approaching some maxima with the current approach.
Paying for compute in the cloud. That’s what I am betting on. Multiple providers, different data center players. There may be healthy margins for them but I would bet it’s always going to be relatively cheaper for me to pay for the compute rather than manage it myself.
> There may be healthy margins for them but I would bet it’s always going to be relatively cheaper for me to pay for the compute rather than manage it myself.
Depends almost completely on usage. No one is renting out hardware 24x7 and making a loss on it.
If you only have sporadic use then renting is better. If you're running it almost all the time of purchasing it outright is better.
Sure but we were talking about gaming rigs to run models locally. You are describing some extreme edge folks that are keeping 24/7 work on gaming rigs in your home.
> Sure but we were talking about gaming rigs to run models locally. You are describing some extreme edge folks that are keeping 24/7 work on gaming rigs in your home.
In that scenario the case is even weaker for the rented-hardware model - if you're going to have a gaming rig, you're only paying a little bit more on top for a GPU with more RAM, not the full cost of the rig.
The comparison then is the extra cost of using a 24GB GPU over a standard gaming rig GPU (12GB? 8GB?) versus the cost of renting the GPU whenever you need it.