Hacker News new | ask | show | jobs
by kaycey2022 238 days ago
Gemini CLI is definitely a much worse client than some of the other agent clients like opencode, cursor etc. But from my experience, that isn't because of the model quality. I get better quality responses from the gemini web chat interface than chatgpt, claude etc.

Of course my experience is anecdotal, but we hardly have any decent benchmarks to compare these models. I suspect most benchmarks have leaked into training sets, rendering them useless anyway.

2 comments

Also people don't talk enough about (or are bad at separating themselves) the model vs. the client tool - e.g. from your comment maybe using codex/Claude Code/aider with Gemini API would be better, best even, but people rarely make that comparison or separation, it's always 'Claude Code with Claude vs. codex with GPT-x' etc.
Yeah. The client tool does make a difference. For example opencode, if I am correct, just spins up its own language servers and then feeds the language server errors back into the model, resulting in a much better agentic coding experience. I don't think they are doing anything much more complex than that.

Unfortunately, nearly all the foundation model companies are just wasting their efforts on the clients, which are kind of ass, instead of focusing on the model.

Google would be much better off if they ditch their dogshit cli, and allow us to have the generous quota login off any client.

To be fair, most of the times, the tools works best with the models trained with those tools in mind, and vice-versa.

Not to mention not all models/inference works the same way so you can't really replicate the same experience. For example, new Harmony format means you can now inject messages while GPT-OSS is running inference, but obviously Claude Code don't support that because their models don't support that.

>most of the times, the tools works best with the models trained with those tools in mind

This is a garbage state of affairs though

What do you expect? People building software using other models than they themselves develop? Or people training the models train them for software that isn't the software they develop themselves?

It's like saying official car repair shops should repair any type of car, not just their brand. That's just not how the real world works.

Or they just don't even do it? Michelin tyres aren't best fitted by a Michelin shop, those don't exist. You go to KwikFit or whatever you think's best, and get Michelin or Continental or Pirelli or whatever you think's best fitted.
Terrible analogy, because the issues involved in fitting car tires of different brands are in no way comparable to the differences between LLM behavior across models.
I agree, gemini pro is a great model for coding if you don't need to do agentic work. I've found that it's a lot less "wordy" when editing, debugging, reviewing, etc. It gets to the point whereas other models can provide long useless explanations. It's also very smart and great with long context.