Hacker News new | ask | show | jobs
by daft_pink 317 days ago
i’m really curious how well they perform with a long chat history. i find that gemini often gets confused when the context is long enough and starts responding to prior prompts, using the cli or it’s gem chat window.
1 comments

From my experience. Gemini is REALLY bad about context blending. It can't keep track of what I said and what it said in a conversation under 200K tokens. It blends concepts and statements up, then refers to some fabricated hybrid fact or comment.

Gemini has done this in ways that I haven't seen in the recent or current generation models from OpenAI or Anthropic.

It really surprised me that Gemini performs so well in multi-turn benchmarks, given that tendency.

I’ve not experimented with the recent models for this but older Gemini models were awful for this - they’d lie about what I’d said or what was in their system prompt even with short conversations.