Hacker News new | ask | show | jobs
by jasonjmcghee 239 days ago
I've had a pretty poor experience with Gemini.

I've had to convince it to do things it should just be able to do but thinks it can't for some reason. Like reading from a file outside of the project directory- it can do it fine, but refuses to unless you convince it that no it actually can.

Also has inserted "\n" instead of newlines on a number of occasions.

I'd argue these behaviors are much more important than being able to use interactive commands.

6 comments

Gemini doesn't seem to be trained on tool use (which Claude is) so it quiet often thinks it can't do something it certainly can and does a lot of nonsense. For me it fails nearly everytime while it's trying to read project files because it uses relative paths instead of absolute so I've put "For your "ReadFile" and "WriteFile" tool, you MUST use absolute paths to files" in my system instructions.

Speaking of system instructions, Gemini always forgets them or doesn't follow them. And it still puts code comments nearly everywhere, it drives me nuts.

Codex is much better at following system instructions but the CLI is..... very bad.

My experience with Gemini 2.5 Pro has oddly been better, maybe because I use RooCode/Cline? It was oddly apologetic, though, wasting tokens on lamenting its failure when it fails to do something and whatnot, instead of just getting on with the solution.

At the same time, even the big versions of Qwen3 Coder (480B) regularly mess up file paths and use the wrong path separators, leading to files like srccomponentsMyComponent.vue from being created instead of src/components/MyComponent.vue.

> And it still puts code comments nearly everywhere, it drives me nuts.

I’ve had the issue of various models sometimes inserting comments like “// removed Foo” when it makes no sense to indicate the absence of something that’s not necessary there for a code block that isn’t there.

At the same time, sometimes the LLMs love to eat my comments when doing changes and leave behind only the code.

How silly (and annoying). It’s good to be able to try out multiple models with the exact same prompts though, maybe I should create my own custom mode for RooCode with all of the important stuff I want baked in.

Codex doesn’t give feedback while it’s running. It just works quietly in a way that’s not easy to interrupt if you could see it going off the rails.

Claude is better at this.

Set these in the config.toml for codex and you'll get a lot more info while it's running:

    model_reasoning_summary = "detailed"
    model_verbosity = "high"
    model_supports_reasoning_summaries = true
    show_raw_agent_reasoning = true
> Speaking of system instructions, Gemini always forgets them or doesn't follow them. And it still puts code comments nearly everywhere, it drives me nuts.

Yup, I've tried to use Gemini so many times, but the lack of being able to strictly follow system prompts makes it so hard to get useful stuff out of it that doesn't need to be cleaned out. Code comments is short of impossible to get rid of, they must have trained it with only code that has comments, because the model really likes to add them everywhere.

Every agent+model combination has issues right now, I'm personally swapping between them depending on the task.

Gemini is great for stuff you need fast and don't care about the quality, as you can just throw it away.

Claude Code + Sonnet is great in many ways and follows prompts way better, but has a tendency to go off on tangents and really get lost in the woods. It requires handholding and basically interrupt it as soon as you see something weird, to steer it in the right direction. Complex stuff has to be aggressively split into smaller validated sub-tasks manually. Tends to also stop continuing by itself to say "Well, we've done half now, you want me to continue with the other half?"

Codex + GPT-5is the best at following prompts, produces the highest quality code, but is way slower than others, and still struggles with seemingly arbitrary stuff yet able to solve complex tasks by itself without any hand-holding. It can get stuck on something obvious, but at least it won't run off on it's own and it'll complete everything as well as it can, even if it takes 30 minutes.

Qwen Coder seems outright unusable and haven't been able to use it for anything good at all.

Tried AMP for a while as well, nice UI and model seems good, but too expensive (and I say this as someone who currently gives $200/month to OpenAI).

Gemini seems to have a poor model of both what it can and what it is allowed to do.

I’ve noticed the latter with several image generation refusals I could eventually easily talk them out of (usually by mentioning fair use in a copyright/trademark context).

> Gemini seems to have a poor model of both what it can and what it is allowed to do.

Starting to feel like LLMs models are more of a representation of the culture of the company training them, than a fair representation of the world at large.

ConwAI’s law?
Well, those are problems with the underlying Gemini models. It's not like the team responsible for CLI could have trained a better model instead of making this feature.

Gemini 3.0 is likely to be released soon, and likely they would improve agentic coding experience.

Gemini CLI is definitely a much worse client than some of the other agent clients like opencode, cursor etc. But from my experience, that isn't because of the model quality. I get better quality responses from the gemini web chat interface than chatgpt, claude etc.

Of course my experience is anecdotal, but we hardly have any decent benchmarks to compare these models. I suspect most benchmarks have leaked into training sets, rendering them useless anyway.

Also people don't talk enough about (or are bad at separating themselves) the model vs. the client tool - e.g. from your comment maybe using codex/Claude Code/aider with Gemini API would be better, best even, but people rarely make that comparison or separation, it's always 'Claude Code with Claude vs. codex with GPT-x' etc.
Yeah. The client tool does make a difference. For example opencode, if I am correct, just spins up its own language servers and then feeds the language server errors back into the model, resulting in a much better agentic coding experience. I don't think they are doing anything much more complex than that.

Unfortunately, nearly all the foundation model companies are just wasting their efforts on the clients, which are kind of ass, instead of focusing on the model.

Google would be much better off if they ditch their dogshit cli, and allow us to have the generous quota login off any client.

To be fair, most of the times, the tools works best with the models trained with those tools in mind, and vice-versa.

Not to mention not all models/inference works the same way so you can't really replicate the same experience. For example, new Harmony format means you can now inject messages while GPT-OSS is running inference, but obviously Claude Code don't support that because their models don't support that.

>most of the times, the tools works best with the models trained with those tools in mind

This is a garbage state of affairs though

What do you expect? People building software using other models than they themselves develop? Or people training the models train them for software that isn't the software they develop themselves?

It's like saying official car repair shops should repair any type of car, not just their brand. That's just not how the real world works.

Or they just don't even do it? Michelin tyres aren't best fitted by a Michelin shop, those don't exist. You go to KwikFit or whatever you think's best, and get Michelin or Continental or Pirelli or whatever you think's best fitted.
I agree, gemini pro is a great model for coding if you don't need to do agentic work. I've found that it's a lot less "wordy" when editing, debugging, reviewing, etc. It gets to the point whereas other models can provide long useless explanations. It's also very smart and great with long context.
All LLMs and agents have stupid issues like this.

GPT-5 insisted on using bash commands to edit a file, despite the dedicated tool for doing this. Problem was that the bash tool it used wrapped at 80 chars, splitting some strings between lines, which then broke the code at a syntax level. It was never able to recover, I was not impressed with GPT-5

I have had these exact issues a lot with codex (gpt-5-codex)
I second this man’s take. I’ve been using it consistently for a few months to give it a try and is definitely subpar. It can give really good answers at times however isn’t worth the time, energy, or luck to get it there.