Hacker News new | ask | show | jobs
by deepdarkforest 261 days ago
The Chinese are doing what they have been doing to the manufacturing industry as well. Take the core technology and just optimize, optimize, optimize for 10x the cost/efficiency. As simple as that. Super impressive. These models might be bechmaxxed but as another comment said, i see so many that it might as well be the most impressive benchmaxxing today, if not just a genuinely SOTA open source model. They even released a closed source 1 trillion parameter model today as well that is sitting on no3(!) on lm arena. EVen their 80gb model is 17th, gpt-oss 120b is 52nd https://qwen.ai/blog?id=241398b9cd6353de490b0f82806c7848c5d2...
4 comments

They still suck at explaining which model they serve is which, though.

They also released today Qwen3-VL Plus [1] today alongside Qwen3-VL 235B [2] and they don't tell us which one is better. Note that Qwen3-VL-Plus is a very different model compared to Qwen-VL-Plus.

Also, qwen-plus-2025-09-11 [3] vs qwen3-235b-a22b-instruct-2507 [4]. What's the difference? Which one is better? Who knows.

You know it's bad when OpenAI has a more clear naming scheme.

[1] https://modelstudio.console.alibabacloud.com/?tab=doc#/doc/?...

[2] https://modelstudio.console.alibabacloud.com/?tab=doc#/doc/?...

[3] https://modelstudio.console.alibabacloud.com/?tab=doc#/doc/?...

[4] https://modelstudio.console.alibabacloud.com/?tab=doc#/doc/?...

> They still suck at explaining which model they serve is which, though.

"they" in this sentence probably applies to all "AI" companies.

Even the naming/versioning of OpenAI models is ridiculous, and then you can never find out which is actually better for your needs. Every AI company writes several paragraphs of fluffy text with lots of hand waving, saying how this model is better for complex tasks while this other one is better for difficult tasks.

Both Deepseek and Claude are exceptions. Simple versions and Sonnet is overall worse but faster than Opus for the same version.
Eh i mean often innovation is made just by letting a lot of fragmented, small teams of cracked nerds trying out stuff. It's way too early in the game. I mean, qwens release statements have anime etc. IBM, Bell, Google, Dell, many did it similarly, letting small focused teams having many attempts at cracking the same problem. All modern quant firms are doing basically the same as well. Anthropic is actually an exception, more like Apple.
it's sometimes not really a matter of which one is better but which one fits best.

For example many have switched to qwen3 models but some still vastly prefer the reasoning and output of QwQ (a qwen2.5 model).

And the difference between them: those with "plus" are closed weight, you can only access them through their api. The others are open-weight, so if they fit your use case, and if ever the want or need arise, you can download them, use them, even fine-tune them locally, even if qwen don't offer access to them any more.

If the naming is so clear to you, then why don't you explain: for a user who wants to use Qwen3-VL through an API, which one has better performance? Qwen3-VL Plus or Qwen3-VL 235b?
My precedent post should have answered this question. But since it didn't, I think I'm ill equipped to answer you in a satisfactory fashion, I would just be repeating myself.
Exactly. You're ill equipped to answer the question because you don't know. Qwen is terrible at explaining what the difference is, between the models that they serve on their API.

It's such a simple question: "For someone who does not want to run the model locally, what is the difference between these 2 models on the API?" and yet nobody can answer that question.

> Take the core technology and just optimize, optimize, optimize for 10x the cost/efficiency. As simple as that. Super impressive.

This "just" is incorrect.

The Qwen team invented things like DeepStack https://arxiv.org/abs/2406.04334

(Also I hate this "The Chinese" thing. Do we say "The British" if it came from a DeepMind team in the UK? Or what if there are Chinese born US citizens working in Paris for Mistral?

Give credit to the Qwen team rather than a whole country. China has both great labs and mediocre labs, just like the rest of the world.)

The naming makes some sense here. It's backed by the very Chinese Alibaba and the government directly as well. It's almost a national project.
The Americans do that all the time. :P
> Do we say "The British"

Yes.

Yeah it's just weird Orientalism all over again
> Also I hate this "The Chinese" thing

to me it was positive assessment, I adore their craftsmanship and persistence in moving forward for long period of time.

It erases the individuals doing the actual research by viewing Chinese people as a monolith.
Interestingly, I've found that models like Kimi K2 spit out more organic, natural-sounding text than American models

Fails on the benchmarks compared to other SOTA models but the real-world experience is different

> Take the core technology and just optimize, optimize, optimize for 10x the cost/efficiency.

This is what really grinds my gears about American AI and American technology in general lately, as an American myself. We used to do that! But over the last 10-15 years, it seems like all this country can do is try to throw more and more resources at something instead of optimizing what we already have.

Download more ram for this progressive web app.

Buy a Threadripper CPU to run this game that looks worse than the ones you played on the Nintendo Gamecube in the early 2000s.

Generate more electricity (hello Elon Musk).

Y'all remember your algorithms classes from college, right? Why not apply that here? Because China is doing just that, and frankly making us look stupid by comparison.