Hacker News new | ask | show | jobs
by sunaookami 239 days ago
Gemini doesn't seem to be trained on tool use (which Claude is) so it quiet often thinks it can't do something it certainly can and does a lot of nonsense. For me it fails nearly everytime while it's trying to read project files because it uses relative paths instead of absolute so I've put "For your "ReadFile" and "WriteFile" tool, you MUST use absolute paths to files" in my system instructions.

Speaking of system instructions, Gemini always forgets them or doesn't follow them. And it still puts code comments nearly everywhere, it drives me nuts.

Codex is much better at following system instructions but the CLI is..... very bad.

4 comments

My experience with Gemini 2.5 Pro has oddly been better, maybe because I use RooCode/Cline? It was oddly apologetic, though, wasting tokens on lamenting its failure when it fails to do something and whatnot, instead of just getting on with the solution.

At the same time, even the big versions of Qwen3 Coder (480B) regularly mess up file paths and use the wrong path separators, leading to files like srccomponentsMyComponent.vue from being created instead of src/components/MyComponent.vue.

> And it still puts code comments nearly everywhere, it drives me nuts.

I’ve had the issue of various models sometimes inserting comments like “// removed Foo” when it makes no sense to indicate the absence of something that’s not necessary there for a code block that isn’t there.

At the same time, sometimes the LLMs love to eat my comments when doing changes and leave behind only the code.

How silly (and annoying). It’s good to be able to try out multiple models with the exact same prompts though, maybe I should create my own custom mode for RooCode with all of the important stuff I want baked in.

Codex doesn’t give feedback while it’s running. It just works quietly in a way that’s not easy to interrupt if you could see it going off the rails.

Claude is better at this.

Set these in the config.toml for codex and you'll get a lot more info while it's running:

    model_reasoning_summary = "detailed"
    model_verbosity = "high"
    model_supports_reasoning_summaries = true
    show_raw_agent_reasoning = true
> Speaking of system instructions, Gemini always forgets them or doesn't follow them. And it still puts code comments nearly everywhere, it drives me nuts.

Yup, I've tried to use Gemini so many times, but the lack of being able to strictly follow system prompts makes it so hard to get useful stuff out of it that doesn't need to be cleaned out. Code comments is short of impossible to get rid of, they must have trained it with only code that has comments, because the model really likes to add them everywhere.

Every agent+model combination has issues right now, I'm personally swapping between them depending on the task.

Gemini is great for stuff you need fast and don't care about the quality, as you can just throw it away.

Claude Code + Sonnet is great in many ways and follows prompts way better, but has a tendency to go off on tangents and really get lost in the woods. It requires handholding and basically interrupt it as soon as you see something weird, to steer it in the right direction. Complex stuff has to be aggressively split into smaller validated sub-tasks manually. Tends to also stop continuing by itself to say "Well, we've done half now, you want me to continue with the other half?"

Codex + GPT-5is the best at following prompts, produces the highest quality code, but is way slower than others, and still struggles with seemingly arbitrary stuff yet able to solve complex tasks by itself without any hand-holding. It can get stuck on something obvious, but at least it won't run off on it's own and it'll complete everything as well as it can, even if it takes 30 minutes.

Qwen Coder seems outright unusable and haven't been able to use it for anything good at all.

Tried AMP for a while as well, nice UI and model seems good, but too expensive (and I say this as someone who currently gives $200/month to OpenAI).

Gemini seems to have a poor model of both what it can and what it is allowed to do.

I’ve noticed the latter with several image generation refusals I could eventually easily talk them out of (usually by mentioning fair use in a copyright/trademark context).

> Gemini seems to have a poor model of both what it can and what it is allowed to do.

Starting to feel like LLMs models are more of a representation of the culture of the company training them, than a fair representation of the world at large.

ConwAI’s law?