Hacker News new | ask | show | jobs
by bjackman 157 days ago
Lately with Gemini CLI / Jules it doesn't seem like time spent is a good proxy for difficulty. It has a big problem with getting into loops of "I am preparing the response for the user. I am done. I will output the answer. I am confident. Etc etc".

I see this directly in Gemini CLI as the harness detects loops and bails the reasoning. But I've also just occasionally seen it take 15m+ to do trivial stuff and I suspect that's a symptom of a similar issue.

3 comments

I've noticed using antigravity and vscode, Gemini 3 pro often comes back with model too busy or something like that and basically 500s.

Seems like capacity because it works a lot better late at night.

I don't see the same with the claude models in antigravity.

I also noticed that and I also noticed that it starts to struggle when the workspace "tab" you're working in gets longer - it basically gets stuck at "Starting agent ...". I initially thought it must be a very big context that the model is struggling with but since since restarting the "app" and kill -9 fixes it, it suggests that it's a local issue. Strange.
Anecdotally, I notice better performance and output quality across most providers outside of 8a-5p ET.
Yeah that's a separate issue though, it predates the time when the looping issues got really common, for me at least.
I saw this too. Sometimes it "think" inside of the actual output and its much more likely to end up in the loop of "I am ready to answer" while it is doing that already
I feel like sometimes it just loops those messages when it doesn't actually generate new tokens. But I might be wrong
There are some other failure modes that all feel kinda vaguely related that probably help with building a hypothesis about what's going wrong:

Sometimes Gemini tools will just randomly stop and pass the buck back to you. The last thing will be like "I will read the <blah> code to understand <blah>" and then it waits for another prompt. So I just type "continue" and it starts work again.

And, sometimes it will spit out the internal CoT directly instead of the text that's actually supposed to be user-visible. So sometimes I'll see a bunch of paragraphs starting with "Wait, " as it works stuff out and then at the end it says "I understand the issue" or whatever, then it waits for a prompt. I type "summarise" and it gives me the bit I actually wanted.

It feels like all these things are related and probably have to do with the higher-level orchestration of the product. Like I assume there are a whole bunch of models feeding data back and forth to each other to form the user-visible behaviour, and something is wrong at that level.

At one point it started spitting out its CoT in the comments of the code it’s supposed to be changing.
Ah yeah I've seen that too. Definitely seems related.

I suspect this is also something like the "inverse" of a prompt hijacking situation. Basically it's losing track of where its output is flowing to (whereas prompt injection is when it loses track of where its input is flowing from).