| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by rjh29 248 days ago
	Who's going to pay to run those models? They are currently running at a huge loss.

8 comments

WalterSear 248 days ago

Anthropic said their inference is cash positive. I would be very surprised if this isn't the norm.

link

timmytokyo 248 days ago

As if inference exists in a bubble. Driving a car from point A to point B costs $0, as long as you exclude the cost of the car or the fuel you purchased before you were at point A.

link

rich_sasha 248 days ago

I believe that, equally it's so unverifiable that it's a point of faith.

I'm not suggesting it's an outright lie, but rather it's easy to massage the costs to make it look true even if it isnt. Eg does GPU cost go into inference cost or not?

link

surgical_fire 248 days ago

I would be surprised if they are being honest.

link

squidbeak 248 days ago

I'd be more surprised if they didn't know their own business costs.

link

wmf 248 days ago

There's an accounting question of whether they count free tier inference as COGS or marketing.

link

tayo42 248 days ago

Aren't they taking investor money? It would be a huge scandal if they're lying?

link

harvey9 248 days ago

I can run quite useful models on my PC. Might not change the world but I got a usable transcript of an old foreign language TV show and then machine translated to English. It is not as good as professional subtitles but i wasn't willing to pay the cost of that option.

link

mmh0000 248 days ago

I did something similar with Whisper a year or so ago.

9 years ago, when my now wife and I were dating, we took a long cross-country road trip, and for a lot of it, we listened to NPR's Ask Me Another (a comedy trivia game).

Anyway, on one random episode, there was a joke in the show that just perfectly fit what we were doing at that exact moment. We laughed and laughed and soon forgot about it.

Years later, I wanted to find that again and purposely recreate the same moment.

I downloaded all 300 episodes as MP3s. I used Whisper to generate text transcripts, followed by a little bit of grepping, and I found the one 4-second joke that otherwise would have been lost to time.

link

jkestner 248 days ago

Now, at the price you paid to retrieve that memory, is it a viable business model?

link

mmh0000 248 days ago

I downloaded 2GiB of data and let a script run for 56 hours. Besides a bit of my time, which I found to be enjoyable, it didn't cost me anything.

Maybe you could argue it cost some electricity, but... In reality, it meant my computer, which runs 24/7 pulling ~185W, was running at ~300W for 56 hours... Thusly.. 300 - 185 = 115W * 56H = 6.44kWh @ $0.13 per kWh = $0.85 + tax.

So... Yes, it was very much worth $0.85 to make my wife happy.

link

kcexn 248 days ago

It's a little bit more complicated than that if you were running a business.

You would want to add the cost of your network+hardware depreciating over the timeframe, and you probably can't just ignore the first 185W since if you are Anthropic it doesn't seem likely that the idle power draw would be needed if they weren't expecting to serve AI traffic.

So, let's say $0.02 per hour ($1/50 roughly). That's about $15 per month per user. Let's call it $10 per month per user since users aren't constantly hammering the service. To support a big sales and marketing engine, you would like to be selling subscriptions for $100+ per month. I'm just not sure people are prepared to pay that for AI in its current form.

link

fennecbutt 248 days ago

Damn, I hope you realise how cheap that electricity is.

link

surgical_fire 248 days ago

"we will be left with local models that can be sort of useful but also sort of sucks" is not really a great proposition for the obscene amount of money being invested in this.

link

joshuahedlund 248 days ago

Won’t those models gradually become outdated (for anything related to events that happen after the model was trained, new code languages or framework versions, etc) if no one is around to continually re-train them?

link

jay_kyburz 248 days ago

They should be fine for things that don't change. (which is a lot of stuff)

If you are feeding the LLM a report, and asking it for a summary, it doesn't need the latest updates from Wikipedia or Reddit.

link

mike_hearn 248 days ago

There's a gazillion use cases for these things in business that aren't even beginning to be tapped yet. Demand for tokens should be practically unlimited for many years to come. Some of those ideas won't be financially viable but a lot will.

Consider how much software is out there that can now be translated into every (human) language continuously, opening up new customers and markets that were previously being ignored due to the logistical complexity and cost of hiring human translation teams. Inferencing that stuff is a no brainer but there's a lot of workflow and integration needed first which takes time.

link

quesera 248 days ago

Running the models is cheap. That will be worthwhile even if the bubble pops hard. Not for all of the silly stuff we do today, but for some of it.

Creating new LLMs might be out of reach for all but very well-capitalized organizations with clear intentions, and governments.

There might be a viable market for SLMs though. Why does my model need to know about the Boer wars to generate usable code?

link

mordymoop 248 days ago

Perhaps surprisingly considering the current stratospheric prices of GPUs, the performance-per-dollar of compute is still rising faster than exponentially. In a handful years it will be cheap to train something as powerful as the models that cost millions to train today. Algorithmic efficiencies also stack up an make it cheaper to build and serve older models even on the same hardware.

It’s underappreciated that we would already be in a pretty absurdly wild tech trajectory just due to compute hyperabundance even without AI.

link

logicchains 248 days ago

They're not running at a loss. Training runs at a loss, but the models are profitable to serve if you don't need to continuously train new models.

link

jayd16 248 days ago

But you do or you're missing current events, right?

link

dcre 248 days ago

Not at all, otherwise models with knowledge cutoffs of six months to a year ago (all current SOTA models) would be useless. Current information is fed into the model as part of the prompt. This is why they use web search.

The main reason they train new models is to make them bigger and better using the latest training techniques, not to update them with the latest knowledge.

link

jay_kyburz 248 days ago

I'm trying to avoid getting into the habit of asking LLMs about current events, or really any events. Or really facts at all.

I think LLMs work best when you give it data, and ask it to try make sense of it, or find something interesting, or some problem. To see something I can't see, then I can go back and go back to the original data and make sure its true.

link

fragmede 248 days ago

There are a number of techniques to modify a model post-training. Some of those techniques allow adding current events to the model's "knowledge" without having to do an entire from-scratch training run, saving money.

link

dcre 248 days ago

They are obviously running free users at a loss. Can you point to evidence of negative margins on subscriptions and enterprise contracts?

link

qgin 248 days ago

The models get more efficient every year and consumer chips get more capable every year. A GPT-5 level model will be on every phone running locally in 5 years.

link

qgin 248 days ago

Why such a reaction to this statement? Is this not the track we're on?

link

swarnie 248 days ago

Can i sign up for an alterative future please? This one sounds horrendous.

link

antonvs 248 days ago

I run models for coding on my own machines. They’re a trivial expense compared to what I earn from the work I do.

The “at a loss” scenario comes from (1) training costs and (2) companies selling tokens below market to get market share. Neither of those imply that people won’t run models in future. Training new frontier-class models could potentially become an issue, but even that seems unlikely given what these models are capable of.

link

surgical_fire 248 days ago

It's unclear if people would pay the price to use them if they were not below market.

I have access to quite a few models, and I use them here and there. They are sort of useful, sometimes. But I don't pay directly for any of them. Honestly, I wouldn't.

link

Juliate 248 days ago

Ok, running them locally, that's definitely a thing.

But then, without this huge financial and tech bubble that's driven by these huge companies:

1/ will those models evolve, or new models appear, for a fraction of the cost of building them today?

2/ will GPU (or their replacement) also cost a fraction of what they cost today, so that they are still integrated in end-user processors, so that those model can run efficiently?

link

azeirah 248 days ago

Given the popularity and activity and pace of innovation seen on /r/LocalLLaMa, I do think models will keep improving. Likely not at the same pace as they are today, but those people love tinkering but it's mostly enthusiasts with a budget for a fancy setup in a garage, independent researchers and smaller businesses doing research there.

These people won't sit still and models will keep getting better as well as cheaper to run.

link

antonvs 248 days ago

No-one on LocalLlama is training their own models. They’re working with foundation models like Llama from Meta and tweaking them in various ways: fine tuning, quantizing, RAG, etc. There’s a limit to how much improvement can be made like that. The basic capabilities of the foundation model still constrain what’s possible.

link