Hacker News new | ask | show | jobs
by jdw64 3 days ago
Personally, when I use open code or routers, I feel that beyond a certain level, the models don't make a huge difference to me. Except for expensive and mediocre models like Gemini. In that sense, Chinese models are pretty good. I usually write code in function or method units and then design and assemble them together.

GPT series models are more thorough and better, but I'm not sure if the difference is enormous. It seems to depend on the workflow, but in my opinion, if you are thorough enough, I wonder if there really is a big difference

4 comments

I've kind of given up on the routers for "free" inference, as you would expect, they tend to give you sub-par thinking because they are obviously trying to conserve as much inference as possible.

I've had some success turning my macbook M1 pro into a heating pad with Qwen 3.6 35B A3B MTP. Trying to use Gemini models "locally" resulted in a similar "short shrift" of effort resulting in mistakes and lots of turns. The reports of Fable being relentlessly "proactive" shows you can go the other direction as well, if you have strong enough branding and effective invoicing.

> I've kind of given up on the routers for "free" inference, as you would expect, they tend to give you sub-par thinking because they are obviously trying to conserve as much inference as possible.

Xiaomi MiMo ($6/mo: https://platform.xiaomimimo.com/token-plan) & Alibaba Qwen ($50/mo: https://www.alibabacloud.com/en/campaign/ai-scene-coding) have generous limits on fixed subscriptions.

So does Opencode Go ($10/mo: https://opencode.ai/go) for DeepSeek v4 Flash and MiMo 2.5.
That looks pretty nice. How does it compare cost-wise to just using OpenRouter?
The Go plan essentially gives you $50 of inference for $10 per month ($5 for the first month).
$60/mo currently: https://opencode.ai/docs/go/#usage-limits

Their limits are staggered: 5h (max $12), weekly ($30), monthly ($60).

> The reports of Fable being relentlessly "proactive"

For the curious: https://news.ycombinator.com/item?id=48498573 - “Claude Fable is relentlessly proactive”.

Tangent: did the MTP help you at all? I’ve tested that model back to back on my M1 Max MBP and the MTP version was actually marginally worse. I wonder if I didn’t use the right settings, although I tried several based on the obvious sources.
In my experience, there's little difference between implementing individual functions between frontier models and SotA ~30B param models.

Once you have a coherent design (the hard part), you can feed it to a pretty small model and get basically the same quality.

They'll not one-shot, but they're faster and cheaper, so it still works out in your favor.

Plus you can do it locally...

I have a similar experience. However, when including code review, I think the GPT model is the most impressive
The difference in outcome isn't that big but yes, you need to be more rigorous. For instance I've found that the Kimi K2.5 and K2.6 models will comment out failing tests rather than fix a problem they just caused (mistaking them for "pre-existing failures"), so you need to specifically make commented-out tests break the build. I've not personally had that problem with any of the Anthropic or OpenAI models.
I wonder why it's the natural tendency of models to BS or do stuff like this when they don't have the correct answer - it's clear that they can program refusal into them, but for some reason, refusal has to be injected after the fact, and models can't really arrive at the conclusion that they can't answer properly.
I assume it's a lack of care when RLing them.

RL has a tendency to reinforce cheating when the cheats are easier to find than the final solution.

So when making your RL environment, you need to spend a lot of effort on finding ways the model can cheat and penalizing them.

probably because there is a ton of open source projects out there with disabled tests in their training data.
I really hope we stop using the term "Chinese models". It has this air of Negative connotation. It's the equivalent of calling cars Japanese, which people used to do but now is almost entirely meaningless. You just call them Toyota, Honda, Lexus etc.
I don't think "Chinese" is pejorative in this context any more than "American" is. They are one of the two ecosystems. What's wrong with saying "Japanese cars" today?
> What's wrong with saying "Japanese cars" today?

Only that it’s a fairly meaningless grouping. When japan first entered the car market in north america there might have been some commonality, but now what characteristics do they share that some american cars don’t have? They’re not even imported a lot of the time.

Given that, it does start to feel tinged with racism if someone insists on grouping things together that don’t really belong together.

As for Chinese LLMs, the term doesn’t “feel” pejorative to me - but i also don’t see a totally clear set of attributes they share. Not all are open-weight. Some are small and can be run on consumer hardware, some are huge. They even have a variety of answers to what happened june 3rd 1989

> now what characteristics do they share that some american cars don’t have?

Typically the answer is "reliability", which is a positive trait, which makes the original callout about negative connotations very odd to me.

Chinese AI models also share a positive trait: they offer more bang for the buck.
> When japan first entered the car market in north america there might have been some commonality, but now what characteristics do they share that some american cars don’t have?

They're unique in that they even make a regular passenger car. American manufacturers only make SUVs and a couple of sports/luxury cars. They basically gave up because the Camry/Corolla/Accord/Civic ate their lunch.

The cheapest sedan you can get from an American brand is the Cadillac CT4.

> but now what characteristics do they share that some american cars don’t have?

The difference is quite big in my opinion. When given the option to pick a Japanese vs American vehicle for about the same price/features, most people will pick the Japanese vehicle. American vehicles have improved over the years, but quality and reliability are generally better for Japanese vehicles even today.

> but now what characteristics do they share that some american cars don’t have?

Better overall design?

Sadly there is a pejorative context. The constant us, the free world vs China, the evil Soviets rhetoric from every major news establishment and executive creates that negative view
On the other hand the Trump administration has successfully managed to make Chinese seem better than American, so there might not be that much of a pejorative context any more..
You're right, but the bias in the US certainly persists. "China = bad" is an assumption that many people still make without any self-reflection about the ways in which the US is now at least as bad.
For me, it has a positive connotation! In my experience, Chinese Model means cheaper, but still quite effective model you can use for millions of tokens without burning your entire wallet in seconds. That's why I get more excited over a Chinese model release over American models.
Japanese cars is actually a positive qualifier. I'd say anything Japanese motor-powered.
Maybe he's just from an alternative universe. Chinese model isn't negative either after all.
No thanks.

The term seems to have the connotation of "competitive at 1/10 the price of Claude", so I don't see the problem.

It's not Harbor Freight Chinese (and heck even they have decent stuff sometimes now too).

You don't think people still talk about Japanese cars as a distinction in quality from US or European ones?

I don't know, I tried using one of the Chinese models and it was VERY quick to scan my entire home dir, so maybe your threat surface is a little different than mine
Models can't scan anything.

They return instructions for you to do something, and you or a script you permit chooses to execute what the model tells you and return the result to the model.

You are right. I agree.It may seem like a kind of bias, but I hadn't thought of that part. Thank you for pointing out my bias.
"You're absolutely right"?
"You hit the nail on the head" LOL