| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by amunozo 3 days ago
	These price and speed optimization from Chinese providers, combined with the raising prices from American ones will change the game sooner than later. Many companies are finding issues with the AI bills already.

5 comments

kypro 3 days ago

Another problem is that US models are all closed source, and if you're a large corporate you may not want your org to be held hostage by OpenAI / Anthropic.

I genuinely don't understand what moat these US model labs have. If they're saying recursive self improvement is just around the corner and Chinese labs are only slightly behind the leading US models, what moat does the US labs have? Are the US models going to recursively self improve better than the Chinese open source ones or something?

I might be completely wrong about this, but if I had money in OpenAI or Anthropic I'd be pulling it all right now. I think the chance of them going to near-zero over the next few years is very significant.

hobofan 3 days ago

> you may not want your org to be held hostage by OpenAI / Anthropic

Or Google. I'm working with multiple customers right now that are very pissed at Google for deprecating Gemini 2.5 Flash, canning the GA release of 3.0 Flash and now have to decide whether to bite the bullet of the 5x price increase for 3.5 Flash or switching providers. Quite a few of them will likely fully pivot to open models.

bachmeier 3 days ago

I'd be curious if any of your customers have tried 3.1 Flash Lite. It's cheaper than 2.5 Flash, and in my experience with the free tier, quite an upgrade in terms of quality of response. My suspicion is that Google is killing off the old models because they aren't a good value for the customer or for themselves.

hobofan 2 days ago

Most of them are using it for data extraction use-cases on complex where they are already in a tricky cost vs. quality compromise. Some of them have evaluated 3.1 Flash Lite but for all of them it performed worse than 2.5 Flash and below requirement.

The only ones I've seen switch to 3.1 Flash Lite were from 2.5 Flash Lite, and all for the most simple use cases, e.g. small UX enhancements.

lokar 3 days ago

Their moat is cash to pay politicians to regulate away competition.

GoToRO 3 days ago

maybe the moat is that we slowly start to forget how to code by hand and then you -need- the AI tool.

ChrisClark 3 days ago

I think they are racing because the first ASI will 'win', preventing others, of course we won't be able to bake the right goals into it though.

tancop 3 days ago

i dont think its going to automatically prevent others. super claude might understand why diversity is important. if were talking sci fi scenarios the most likely one is probably overwatch (multiple independent ais with gray ethics and complicated relationships) more than skynet.

MangoCoffee 3 days ago

Chinese model is good enough and cheap.

i've a Github copilot yearly subscription. Microsoft recently changed their billing to based on token. i'm still getting billed per premium request but GPT 5.4 is now 6x compare to 1x before.

reactordev 3 days ago

It's going to be an issue when China ends up scaling faster as well. Faster tokens, faster clusters, qat models, fp4, it's getting scary.

AndrewKemendo 3 days ago

Issue for who?

fillskills 3 days ago

Issue for any country that is not China. A single country getting the most AI tokens business would be generally bad for global economy. Hoping against hope that this business gets globally distributed and there is a healthy marketplace competition overall

reactordev 3 days ago

It’s all about economic warfare. The cheaper you can run the models, the cheaper you can offer them. Undercutting expensive tiers with token limits or exuberant billing practices.

You are right to be scared, because this race to the bottom also provides open weights/models/qat’s for the rest of us and it’s been crazy to see how good they can be on a consumer grade RTX card.

throwa356262 3 days ago

For uncle Sam Altman.

reactordev 3 days ago

American Politics and the far right.

fortzi 3 days ago

For the West

nchmy 1 day ago

Try using opencode go with your github copilot chat. You get easy, cheap access to Chinese models within the familiar interface.

varispeed 3 days ago

I see bigger problem with model inconsistency. You never know whether Anthropic will route your request to a cheaper model for the price of Opus. So you can never estimate how much a task will cost, because you might have to restart several times and pay for each attempt. Then you have to prompt models to gauge whether they are real or impostors which also adds to token usage.

ignoramous 3 days ago

> You never know whether Anthropic will route your request to a cheaper model for the price of Opus

For non subsidized plans? Pretty sure they'd need to put this in ToS, or law suites would have followed by now.

trollbridge 3 days ago

How can you prove it?

Sometimes Opus just gives me a rubbish session.

chairmansteve 2 days ago

But you don't know why...

RussianCow 3 days ago

Isn't that true of any provider? Anyone could be lying about what they're serving.

ignoramous 2 days ago

Yep. For open weights at least, there's possible ways to verify. Ex: https://www.kimi.com/blog/kimi-vendor-verifier

csomar 3 days ago

1. How would you know?

2. They are doing lots of shady stuff that would have gotten someone else banned from visa/mastercard. Your paid off plan literally changes after billing...

I think people are letting them fly for now, because if it turns out true that they'll have AGI they want to be on their good side? We might see the knifes getting pulled otherwise.

sometimelurker 3 days ago

no they 100% use MTP with a cheaper model alongside opus, and it would infact be unprovable if they just sometimes switched to auto-accepting everything from the MTP. its true that if they did anthropic would need to hide that they do this, so its probably not a huge deal

ilaksh 3 days ago

I'm kind of poor so I have been trying to use DeepSeek v4 Flash, GLM 5.1 etc. as much as possible recently instead of Claude or GPT.

petesergeant 3 days ago

You would do us all a service by telling us how your experiences of that have been.

RussianCow 3 days ago

I've been doing the same, though admittedly out of curiosity more so than lack of funds. The open models are catching up quickly in their abilities, to the point where they're (mostly) not doing stupid stuff regularly, but you have to be very specific about what you want. I found that Opus, for example, is much better at asking me to clear up ambiguity in a request before starting, whereas the Chinese models tend to "fill in the blanks" and make their own assumptions.

My current workflow involves going from PRD -> execution plan -> build -> review, and this works nicely with open weight models like GLM 5.1, Kimi K2.6, and DeepSeek V4 Flash. With Opus I can generally skip the PRD entirely, and sometimes even skip the plan, and 80-90% of the time it does exactly what I want. But that can easily burn $5-15 for one feature, whereas it'll cost maybe $1-2 with the open weight models (at API pricing).

andai 3 days ago

> ... you have to be very specific about what you want. I found that Opus, for example, is much better at asking me to clear up ambiguity in a request before starting, whereas the Chinese models tend to "fill in the blanks" and make their own assumptions.

That's the main thing I've noticed. Small models can follow instructions just fine. If the instructions are very specific. Then I often have to spend more time explaining a task than it would have taken me to do it myself.

The bigger models have a lot more common sense.

I wonder if that could be improved slightly through prompting. Asking it to clarify anything that's confusing. Or maybe it just makes incorrect assumptions without realizing the ambiguity. One way to find out!

nchmy 1 day ago

This is my observation as well with deepseek by flags. It takes too much initiative, and is often not particularly smart. Yet, I find it is so fast and good at iterating/correcting it's mistakes that it eventually finds the way on its own.

Though, I tend to use it as a pair programmer so just stop it and provide guidance.

The real problem is that it is excessively verbose - it's impossible to keep up with it's train of thought, and not practical to read it all. So I tend it just let it do it's thing then skim a bit and skip to the end for it's summary.

Try opencode go subscription - you get the Chinese models for 6x discount. I use like $1 a day...

ilaksh 3 days ago

I would say about 35% of the time I run into problems and eventually give up and go to GPT 5.5 and it much more efficiently handles the original task. Then I see the token costs going up and it motivates me to continue trying the open source ones.

andai 3 days ago

Did you try deepseek v4 pro as well? And what kind of tasks?

I'm seeing some people say flash is amazing and can handle everything, and some say it's useless. It seems to depend on the task. I think it depends on the harness too (it works better in Claude Code in my experience, it's probably been trained on that).

ilaksh 2 days ago

the problem for me with deepseek v4 pro is like a significant amount of time it just seems to like never finish what it is doing.. loonnng thinking and then a lot of time to output or just seems to never finish. that has happened several times to me. could be my agent framework partly. .but I have heard other people complain about that also.

it has limitations but it is way better than I expect from something named Flash that is open source.

Schlagbohrer 2 days ago

There's going to be a tipping point where it's worth purchasing more hardware to run the next biggest size of the open model, if they show stepwise improvements that way.

polski-g 3 days ago

I used Opus 4.6, then downgraded to Sonnet, then to GLM5/5.1. GLM is as good as Sonnet. I recently started using Opus 4.8 again and GLM is not close to that.

30 day eval for each.

csomar 2 days ago

The only one that is really close to Claude in performance is GLM-5.1. The others (Mimo, deepseek, etc..) looks good on paper but usually fails on a multi-step agentic orchestration.

This is at least my experience with Claude Code as harness. Also, GLM pricing is not that far off from Claude. It's cheaper but not DeepSeek cheap.

nchmy 1 day ago

Deepseek v4 flash is amazing

throwaway894345 3 days ago

I wonder what are the economics driving these pricing decisions? Are the Chinese companies just subsidizing their models to a greater degree than the US, or is this an emergent property of energy policy between countries?

comboy 3 days ago

For one, they invested in infrastructure. They can build fast and efficiently. They can provide power, they can provide cooling. Even if you just make roads better you make everything more efficient. Plus level of standard education. It all compounds.

On HN China is seen as a cheap labor copycat. This used to be a fair approximation at some point in the past. In my opinion China is getting ahead of everyone else much more than US used to be.

SF is a beautiful thing in the US, vast power and wealth comes from there. Smart people collaborating communicating and building fast and with excitement. China did SF kind of thing for many different sectors in many different places.

Octoth0rpe 3 days ago

Throwing out another factor: Chinese companies have been banned and/or limited from buying nvidia, and turned to local companies for their hardware. I haven't actually seen pricing/benchmarks comparing Chinese AI accelerators, but it wouldn't surprise me if that also worked out in their favor as well.

lokar 3 days ago

And, possibly, state subsidies at every level.

Schlagbohrer 2 days ago

I have to point out the massive state subsidies in the united states for the tech companies and datacenter builders.

throwaway67678 3 days ago

Lower cost of labor, lots of under the hood optimizations (e.g. cache hits for DS), many of these companies have existing infra (fewer upfront costs for deployment), etc

ecshafer 3 days ago

China isn't that cheap for labor. And if you think the guys in Z.ai or xiaoxiao aren't the exact same guys from Tsinghua, Peking, MIT, Stanford, CMU, etc. and pulling in amazing salaries you'd be wrong.

throwaway67678 3 days ago

I'd assume there's more to the cost of labor than the salaries of the elite folks who do the R&D, but fair point

nmfisher 3 days ago

Z.ai was actually a spin-off from Tsinghua (THUDM) AFAIK.

Their models are much smaller: 1T vs 5T for the frontier models. 1T is Sonnet/Google Flash size, not Opus size.

The $0.87/M tokens price for Mimo Pro is probably subsidized.

Mimo models aren't widely available on western providers, but Kimi and Deepseek are similar sizes and cost about the same to run. They are priced $3-$4/M tokens (which is right were Google's very confused range of Flash models are priced at: between $0.40/M tokens and $9/M tokens depending on exactly which model - and you don't want the $9 one!).

Anthropic overprices Sonnet (probably because of their capacity issues). GPT 5.4 mini is $4.50/M tokens.

https://docs.fireworks.ai/serverless/pricing

https://www.together.ai/pricing

Cakez0r 2 days ago

I'm not sure about those parameter sizing claims. Regardless of parameter size, benchmarked intelligence of Chinese and Western frontier models is comparable, so who cares how many parameters it takes to get there.

Mimo is also widely available on western providers. It's on openrouter and you can sign up with Xiaomi directly for a token plan on an English website priced in dollars.

rstuart4133 3 days ago

The Chinese economics: possibly the USA's experience.

It was pretty clear the USA won World War 2 because it out produced and out innovated everyone else. Probably with that in mind, after World War 2 the USA adopted the "Vannevar Bush" model, summarised in this picture: https://www.researchgate.net/figure/annevar-Bushs-Science-th... The idea is to jump start R&D through public funding. The hoped for outcome was that R&D feed private enterprise, leading to a productivity boom.

The boom happened, and the USA did seem to out-compete everybody else in R&D, science, and the products they delivered for decades after that.

That way of doing things seems to have faded over time in the USA. The decline seemed to coincide with the rise of Neo-econmics, and now of course it's been obliterated by Trump. He's very keen to fund Intel to produce chips in a year or two's time (which is something the stock market and banks do perfectly well), but funding basic science is getting drastic cuts.

Still other countries noticed the rise of the USA, and some adopted similar funding models for basic R&D. China seems to have picked it up with gusto, both subsidising R&D and STEM training, leading to huge numbers of engineers and scientists. Whether it will lead to an economic boom remains unknown, but acceleration of ideas and innovations coming out of China seems undeniable. More recently, Ukraine showered its local engineering garages with funds in the hopes of getting a similar outcome to the USA in WW2. It looks like it worked. If the Iran war continues, it's entirely possible arms trade will reverse: the USA could well start buying drones off Ukraine.

orphea 3 days ago

Maybe not being led by a sociopath also helps.

throwaway894345 3 days ago

I'm pretty sure Xi is also a sociopath, but he differs from Trump in that he's competent. And maybe that's a good thing for American democracy--if we had a competent dictator who could manifest massive infrastructure projects maybe the pro-democracy backlash would be significantly attenuated?

orphea 2 days ago

Oh, I was thinking of OpenAI and Anthropic CEOs.

throwaway894345 2 days ago

Heh, isn’t it fun living in a timeline where there are so many sociopathic leaders that your earlier comment is ambiguous? (: