| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by Philip-J-Fry 110 days ago

I think the difference is that with LLMs, in a lot of cases you do see some diminishing returns.

I won't deny that the latest Claude models are fantastic at just one shotting loads of problems. But we have an internal proxy to a load of models running on Vertex AI and I accidentally started using Opus/Sonnet 4 instead of 4.6. I genuinely didn't know until I checked my configuration.

AI models will get to this point where for 99% of problems, something like Gemma is gonna work great for people. Pair it up with an agentic harness on the device that lets it open apps and click buttons and we're done.

I still can't fathom that we're in 2026 in the AI boom and I still can't ask Gemini to turn shuffle mode on in Spotify. I don't think model intelligence is as much of an issue as people think it is.

5 comments

dimmke 110 days ago

100% agree here. The actual practical bottleneck is harness and agentic abilities for most tasks.

It's the biggest thing that stuck out to me using local AI with open source projects vs Claude's client. The model itself is good enough I think - Gemma 4 would be fine if it could be used with something as capable as Claude.

And that's gonna stay locked down unfortunately especially on mobile and cars - it needs access to APIs to do that stuff - and not just regular APIs that were built for traditional invoking.

The same way that websites are getting llm.txts I think APIs will also evolve.

link

Tianning 110 days ago

Agree on the diminishing returns,the Opus 4.6 anecdote is a good signal

link

wj 109 days ago

I'm not sure I understand your last paragraph? The two sentences seem to contradict?

link

BoorishBears 109 days ago

GPT 3.5 was intelligent enough to understand that command and turn it into a correct shaped JSON object: the platforms don't have tight enough integration to take advantage of the intelligence

link

bawana 110 days ago

I think security is the issue-ai is good at circumventing this. For example , ai can read paywalled articles you cannot. Do you really want ai to have ‘free range’.?

link

mewpmewp2 110 days ago

I mean to me even difference between Opus and Sonnet is as clear as day and night, and even Opus and the best GPT model. Opus 4.6 just seems much more reliable in terms of me asking it to do something, and that to actually happen.

link

Philip-J-Fry 110 days ago

It depends what you're asking it though. Sure, in a software development environment the difference between those two models is noticeable.

But think about the general user. They're using the free Gemini or ChatGPT. They're not using the latest and greatest. And they're happy using it.

And I am willing to bet that a lot of paying users would be served perfectly fine by the free models.

If a capable model is able to live on device and solve 99% of people's problems, then why would the average person ever need to pay for ChatGPT or Gemini?

link

mewpmewp2 110 days ago

But even other tasks, like research etc, where dates are important, little details and connections are important, reasoning is important, background research activities or usage of tools outside of software development, and this is where I am finding much of the LLMs most useful for my life.

Even Opus makes mistakes with dates or not understanding news and everything correctly in context with chronological orders etc, and it would be even worse with smaller and less performing models.

Scheduling, planning, researching products, shopping, trip plans, etc...

link

acidtechno303 110 days ago

You're quick to say "to me" in your comparison.

My experience is very different than yours. Codex and CC yield very differenty result both because of the harness differencess and the model differences, but niether is noticeably better than the other.

Personally, I like Codex better just because I don't have to mess with any sort of planning mode. If I imply that it shouldn't change code yet, it doesn't. CC is too impatient to get started.

link

mewpmewp2 110 days ago

I guess yes, that's a harness difference, and you can also configure CC as a harness to behave very differently, but still with same harness and guidance, "to me" there's still a difference in terms of Opus 4.6 and e.g. GPT 5.4 or which GPT model do you use? I've been using Claude Code, Codex and OpenCode as harnesses presently, but for serious long running implementation I feel like I can only really rely on CC + Opus 4.6.

link

acidtechno303 110 days ago

Yes 5.4

Perhaps Opus is superior and I'm just jaded.

I come from Cursor before having adopted the TUI tools. Opus was nothing short of pathetic in their environment compared to the -codex models. I would only use it for investigations and planning because it was faster.

Like you've said, though, that could just be a harness issue.

link

charcircuit 110 days ago

I have the opposite experience. Codex gets to work much faster than Claude Code. Also I've never seen the need to use planning mode for Claude. If it thinks it needs a plan it will make one automatically.

link

acidtechno303 110 days ago

I'll drink to the idea that it's all in my head.

link