Hacker News new | ask | show | jobs
by rglullis 253 days ago
The one thing I don't understand is this assumption that demand for GPUs for training is going to keep growing at the rate they grew so far.

I get the demand for new applications, which require inference, but nowadays with so many good (if not close to SOTA) models available for free and the ability to run them on consumer hardware (apple M4 or AMD Max APUs), is there any demand for applications that justify a crazy amount of investment in GPUs?

4 comments

Inference will be cheapest when run in a shared cloud environment, simply due to the LLMs roofline. Thus, most B2B use cases are likely to be datacenter based, like AWS today.

Of course, cern is still going to use their FPGA hyper-optimized for their specific trigger model for the LHC, and apple is gojng to use a specialized low power ASIC running a quantized model for hello Siri, but I meant the majority usecase.

I do not buy this premise. I think it will end up being cheaper to simply run the LLMs directly on the user device.

I think that there are plenty of competitors in the "LLMs with open weights" space to essentially make the models a commodity, so all that is left is the compute cost and there is no way that someone will be running a datacenter in a way that is cheaper than "the computer that I already have running on my desk".

I nake your point every time this comes up[1] but its absolutely surprising how few business people, most of whom have some credibility in the form of qualifications or experience, actually recognise a value chain when they see it.

==========

[1] https://rundata.co.za/blog/index.html?the-ai-value-chain

Apologies for the second reply, but it also occurs to me that reinforcement learning is the new battleground. Look at the changes between o1, o3 and GPT-5 thinking. Sonnet 3.7, Sonnet 4, and Sonnet 4.5. And so forth.

I expect models will get larger again once everyone is doing their inference on B200s, but the RL training budget is where the insatiable appetite sits right now.

Isn’t the whole point of the arms race that the more GPUs you have the closer you get to AGI? Which is the supposed goal here.
I do not believe for a second that any of those people investing tens of billions of dollars are doing it to "get to AGI". They would only be able to profit from a AGI if it could be simultaneously (a) weaponized and (b) strictly controlled by one party, and there is no one crazy enough that these could be achieved.

If you tell me that people are pouring all that money into data centers because they believe that most applications will use some form of LLM or VLM as the main driver of machine-to-machine and machine-to-person interface, I'd be more inclined to buy it. But then I'd respond that it seems that LLMs are reaching a point of diminishing returns and the big next move is to make it easy and faster to distill/fine-tune the LLMs for specific business needs, which is something that should be possible to do with the existing infra already (I guess?)

I suspect continuous learning will be the next driver of GPU usage.