Hacker News new | ask | show | jobs
by scellus 248 days ago
He writes as if only datacenters and network equipment remain after the AI bubble bursts. Like there won't be any AI models anymore, nothing left after the big training runs and trillion-dollar R&D, and no inference served.
3 comments

Who's going to pay to run those models? They are currently running at a huge loss.
Anthropic said their inference is cash positive. I would be very surprised if this isn't the norm.
As if inference exists in a bubble. Driving a car from point A to point B costs $0, as long as you exclude the cost of the car or the fuel you purchased before you were at point A.
I believe that, equally it's so unverifiable that it's a point of faith.

I'm not suggesting it's an outright lie, but rather it's easy to massage the costs to make it look true even if it isnt. Eg does GPU cost go into inference cost or not?

I would be surprised if they are being honest.
I'd be more surprised if they didn't know their own business costs.
There's an accounting question of whether they count free tier inference as COGS or marketing.
Aren't they taking investor money? It would be a huge scandal if they're lying?
I can run quite useful models on my PC. Might not change the world but I got a usable transcript of an old foreign language TV show and then machine translated to English. It is not as good as professional subtitles but i wasn't willing to pay the cost of that option.
I did something similar with Whisper a year or so ago.

9 years ago, when my now wife and I were dating, we took a long cross-country road trip, and for a lot of it, we listened to NPR's Ask Me Another (a comedy trivia game).

Anyway, on one random episode, there was a joke in the show that just perfectly fit what we were doing at that exact moment. We laughed and laughed and soon forgot about it.

Years later, I wanted to find that again and purposely recreate the same moment.

I downloaded all 300 episodes as MP3s. I used Whisper to generate text transcripts, followed by a little bit of grepping, and I found the one 4-second joke that otherwise would have been lost to time.

Now, at the price you paid to retrieve that memory, is it a viable business model?
I downloaded 2GiB of data and let a script run for 56 hours. Besides a bit of my time, which I found to be enjoyable, it didn't cost me anything.

Maybe you could argue it cost some electricity, but... In reality, it meant my computer, which runs 24/7 pulling ~185W, was running at ~300W for 56 hours... Thusly.. 300 - 185 = 115W * 56H = 6.44kWh @ $0.13 per kWh = $0.85 + tax.

So... Yes, it was very much worth $0.85 to make my wife happy.

It's a little bit more complicated than that if you were running a business.

You would want to add the cost of your network+hardware depreciating over the timeframe, and you probably can't just ignore the first 185W since if you are Anthropic it doesn't seem likely that the idle power draw would be needed if they weren't expecting to serve AI traffic.

So, let's say $0.02 per hour ($1/50 roughly). That's about $15 per month per user. Let's call it $10 per month per user since users aren't constantly hammering the service. To support a big sales and marketing engine, you would like to be selling subscriptions for $100+ per month. I'm just not sure people are prepared to pay that for AI in its current form.

Damn, I hope you realise how cheap that electricity is.
"we will be left with local models that can be sort of useful but also sort of sucks" is not really a great proposition for the obscene amount of money being invested in this.
Won’t those models gradually become outdated (for anything related to events that happen after the model was trained, new code languages or framework versions, etc) if no one is around to continually re-train them?
They should be fine for things that don't change. (which is a lot of stuff)

If you are feeding the LLM a report, and asking it for a summary, it doesn't need the latest updates from Wikipedia or Reddit.

There's a gazillion use cases for these things in business that aren't even beginning to be tapped yet. Demand for tokens should be practically unlimited for many years to come. Some of those ideas won't be financially viable but a lot will.

Consider how much software is out there that can now be translated into every (human) language continuously, opening up new customers and markets that were previously being ignored due to the logistical complexity and cost of hiring human translation teams. Inferencing that stuff is a no brainer but there's a lot of workflow and integration needed first which takes time.

Running the models is cheap. That will be worthwhile even if the bubble pops hard. Not for all of the silly stuff we do today, but for some of it.

Creating new LLMs might be out of reach for all but very well-capitalized organizations with clear intentions, and governments.

There might be a viable market for SLMs though. Why does my model need to know about the Boer wars to generate usable code?

Perhaps surprisingly considering the current stratospheric prices of GPUs, the performance-per-dollar of compute is still rising faster than exponentially. In a handful years it will be cheap to train something as powerful as the models that cost millions to train today. Algorithmic efficiencies also stack up an make it cheaper to build and serve older models even on the same hardware.

It’s underappreciated that we would already be in a pretty absurdly wild tech trajectory just due to compute hyperabundance even without AI.

They're not running at a loss. Training runs at a loss, but the models are profitable to serve if you don't need to continuously train new models.
But you do or you're missing current events, right?
Not at all, otherwise models with knowledge cutoffs of six months to a year ago (all current SOTA models) would be useless. Current information is fed into the model as part of the prompt. This is why they use web search.

The main reason they train new models is to make them bigger and better using the latest training techniques, not to update them with the latest knowledge.

I'm trying to avoid getting into the habit of asking LLMs about current events, or really any events. Or really facts at all.

I think LLMs work best when you give it data, and ask it to try make sense of it, or find something interesting, or some problem. To see something I can't see, then I can go back and go back to the original data and make sure its true.

There are a number of techniques to modify a model post-training. Some of those techniques allow adding current events to the model's "knowledge" without having to do an entire from-scratch training run, saving money.
They are obviously running free users at a loss. Can you point to evidence of negative margins on subscriptions and enterprise contracts?
The models get more efficient every year and consumer chips get more capable every year. A GPT-5 level model will be on every phone running locally in 5 years.
Why such a reaction to this statement? Is this not the track we're on?
Can i sign up for an alterative future please? This one sounds horrendous.
I run models for coding on my own machines. They’re a trivial expense compared to what I earn from the work I do.

The “at a loss” scenario comes from (1) training costs and (2) companies selling tokens below market to get market share. Neither of those imply that people won’t run models in future. Training new frontier-class models could potentially become an issue, but even that seems unlikely given what these models are capable of.

It's unclear if people would pay the price to use them if they were not below market.

I have access to quite a few models, and I use them here and there. They are sort of useful, sometimes. But I don't pay directly for any of them. Honestly, I wouldn't.

Ok, running them locally, that's definitely a thing.

But then, without this huge financial and tech bubble that's driven by these huge companies:

1/ will those models evolve, or new models appear, for a fraction of the cost of building them today?

2/ will GPU (or their replacement) also cost a fraction of what they cost today, so that they are still integrated in end-user processors, so that those model can run efficiently?

Given the popularity and activity and pace of innovation seen on /r/LocalLLaMa, I do think models will keep improving. Likely not at the same pace as they are today, but those people love tinkering but it's mostly enthusiasts with a budget for a fancy setup in a garage, independent researchers and smaller businesses doing research there.

These people won't sit still and models will keep getting better as well as cheaper to run.

No-one on LocalLlama is training their own models. They’re working with foundation models like Llama from Meta and tweaking them in various ways: fine tuning, quantizing, RAG, etc. There’s a limit to how much improvement can be made like that. The basic capabilities of the foundation model still constrain what’s possible.
Is there a genuine use case for today's models, other than for identifying suckers? You can't even systematically apply an LLM to a list of text transformation tasks, because the ability to produce consistent results would make them less effective sycophants.
The point is, after the bubble burst, will there be enough funds, cash flow and... a viable market, to make these still run?
Inference is not that expensive. I'd argue that most models are already useful enough that people will pay to run them.
At $20/month for Claude, I'm satisfied. I'll keep paying that for what I get from it, even if it never improves again.
Of course, but my point is that I don't think it's economically sustainable. If innovation/funding in AI stalls, those $20 will likely skyrocket fast.