| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by rudedogg 29 days ago
	If Google is actually getting cheaper inference than everyone else with their TPUs, this smells like trouble to me. Maybe serving LLMs at a profit is proving difficult. Or maybe they think because their benchmarks are good they can ramp up the prices. Seems like they don’t have the market share to justify a move like that yet to me.

5 comments

tempaccount420 29 days ago

This is not priced at inference cost.

My guess: it's the price at which they make more money than if they rent the TPUs to other companies.

The Gemini team has had trouble securing enough TPUs for their user's needs. They struggle with load and their rate limits are really bad. Maybe at a higher price, they have a better chance at getting more TPUs assigned?

link

gpm 29 days ago

The cost at such they could rent out the TPUs, i.e. the market rate, is the inference cost.

Just because you are vertically integrated doesn't mean you get to discount the one business units products to the other. Doing so discounts the opportunity cost you pay and is just bad accounting.

link

KoolKat23 29 days ago

Basic business principle, you charge what people are willing to pay not what it costs.

link

HDThoreaun 29 days ago

Depends on if you have spare capacity I think. They have minimal competition so they might be maximizing profit by charging prices higher than what clears all their supply.

link

sumedh 29 days ago

> doesn't mean you get to discount the one business units products to the other

That depends, if all developers get used to Claude and Codex it will become harder for Google to attract them in the future.

They might lose devs in the long term.

link

gpm 29 days ago

Predatory pricing is a great business strategy and all (particularly when countering the competitors predatory pricing - what could go wrong), but that doesn't mean that the gemini-team should account for it as if they're getting the compute cheaper, it just means that they should run a loss.

link

flaburgan 29 days ago

That's actually where AI differs: there is no network effect. So no reason for me to stay with a tool if suddenly another one is better or cheaper. Changing the model I use is literally two clicks in Zed. No retention possible for providers.

link

dash2 29 days ago

Look up “double marginalisation”.

link

BoorishBears 29 days ago

This is trouble if you're not Google/OpenAI/Anthropic: they're all shifting towards pricing for the economic value of the knowledge work they're aiding.

The economic value increases non-linearly as models get more intelligent: being 10% more capable unlocks way more than 10% in downstream value.

That's trouble because the non-linear component means at some point their margins will stop primarily defined by the cost of compute, and start being dominated by how intelligent the model is.

At that point you can expect compute prices to skyrocket and free capacity to plummet, so even if you have a model that's "good enough", you can't afford to deploy it at scale.

(and in terms of timing, I think they're all well under the curve for pricing by economic value. Everyone is talking about Uber spending millions on tokens, but how much payroll did they pay while devs scrolled their phones and waited for CC to do their job?)

link

tskj 29 days ago

Thank you, this is obviously where we're heading. People who think in terms of "will it ever be profitable to sell tokens" are thinking in the wrong framework entirely. The correct framework is "will it be profitable to sell knowledge work", and the answer will clearly be "yes".

link

spyckie2 29 days ago

Its probably that in 1 or 2 years local (free) models will completely take the place of cheap models so cheap models need to move up the quality chain.

You have free local models for most tasks, $20 subscriptions for near-frontier intelligence, and API per token costs for frontier intelligence.

Flash seems to be targeting the near-frontier category.

link

TurdF3rguson 29 days ago

That might work if it wasn't for FOMO. Are you ok with only $20 of frontier usage a month?

link

rohansood15 29 days ago

Subjective, but if we compare to compute not everyone needs the most expensive laptops or super computers for their work.

I think frontier models will be invaluable for scientific research, defense, financial analysis and such. But the average person probably would be reasonably well-served with a local model.

If you're in sales, customer service, product management and such - the leading open models at the 30B mark are already good enough.

link

TurdF3rguson 29 days ago

I mean customer service maybe, but how much longer will humans even be doing that job at this point?

link

booty 29 days ago

Prevailing wisdom is that serving LLMs at a profit is achievable... it's when you factor in the cost of training them that prices get astronomical real fast.

Open-source model inference providers (who do not have to bear the cost of training) seem able to do it at much lower prices.

https://www.together.ai/pricing

https://fireworks.ai/pricing#serverless-pricing (scroll down to headline models)

Of course, it's possible that they are burning through investor cash as well, and apples-to-apples comparisons are not possible because AFAIK Google does not mention the size/paramcount for 3.5 Flash.

But if the prevailing wisdom is true, I think it's actually encouraging. It suggests that OpenAI and Anthropic could perhaps, if they need to, achieve profitability if they slow down model development and focus on tooling etc. instead. If true that's probably good news for everybody w.r.t. preventing a bursting of this economic bubble.

...my opinions here are of course, conjecture built on top of conjecture....

link

eklitzke 29 days ago

Most of the training cost is not in the final training run, it's in all of the R&D (including salaries, equity, etc.) that it takes to get to the final training run. The actual cost of all of the TPUs (or GPUs), power, networking, storage, etc. for the final training run is significant, but it's even more expensive to have this huge R&D team doing frontier model development and using a lot of those same resources during development.

I think you're right that releasing models at a slower cadence would bring down costs to some degree, but it's not clear how much. All of these companies could significantly reduce their opex but at the risk of falling behind in terms of being at the frontier.

link

HDBaseT 29 days ago

Not to discredit you, because you are 100% correct but tangential note about together.ai, they seem fairly unreliable with constant outages or higher than normal latency.

link

IncreasePosts 29 days ago

Maybe the margins are just very large for Google because they predict so much demand for 3.5?

link

GodelNumbering 29 days ago

This combined with locally runnable models getting pretty good recently (e.g. Qwen 3.6) tells me that it's time to seriously consider local dev setup again

link

cft 29 days ago

This should become the new Apple's hardware and software play. I am hopeful about the new CEO

link

arcatech 29 days ago

Nothing new about that play. They have been heading in this direction for a very long time now.

link

cft 28 days ago

Perhaps they would have made the basic spell-checker work on MacOS apple silicon then in this long time?

link

MASNeo 29 days ago

Besides the cost you get the control, transparency and ability to identify small language models or LoRA you want to serve even more cost effective.

link