| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by bayarearefugee 149 days ago

I mostly use Gemini, so I can't speak for Claude, but Gemini definitely has variable quality at different times, though I've never bothered to try to find a specific time-of-day pattern to it.

The most reliable time to see it fall apart is when Google makes a public announcement that is likely to cause a sudden influx of people using it.

And there are multiple levels of failure, first you start seeing iffy responses of obvious lesser quality than usual and then if things get really bad you start seeing just random errors where Gemini will suddenly lose all of its context (even on a new chat) or just start failing at the UI level by not bothering to finish answers, etc.

The sort of obvious likely reason for this is when the models are under high load they probably engage in a type of dynamic load balancing where they fall back to lighter models or limit the amount of time/resources allowed for any particular prompt.

2 comments

kevinsync 149 days ago

I suspect they might transparently fall back too; Opus 4.5 has been really reasonable lately, except right after it launched, and also surrounding any service interruptions / problems reported on status.claude.ai -- once those issues resolve, for a few hours the results feel very "Sonnet", and it starts making a lot more mistakes. When that happens, I'll usually just pause Claude and prompt Codex and Gemini with the same issue to see what comes out of the black hole.. then a bit later, Claude mysteriously regains its wits.

I just assume it went to the bar, got wasted, and needed time to sober up!

link

astrange 149 days ago

They don't ever fall back to cheaper models silently.

What Anthropic does do is poke the model to tell you to go to bed if you use it too long ("long conversation reminder") which distracts it from actually answering.

Sometimes they do have associations with things like the day of the year and might be lazier some months than others.

link

pankajdoharey 148 days ago

If they are real slime balls they can justify it by saying you see we use speculative decoding so we first use a smaller faster model model first and then then answer is enhanced by larger model blah blah ..... "FOr the best User experience"

link

scaredreally 149 days ago

Precisely. Once I point out the fact that it is doing this, it seems to produce better results for a bit before going back to the same.

I jokingly (and not so) thought that it was trained on data that made it think it should be tired at the end of the day.

But it is happening daily and at night.

link

woleium 149 days ago

I find it helps to tell it to take some stimulants

link

stavros 149 days ago

I didn't believe such conspiracy theories, until one day I noticed Sonnet 4.5 (which I had been using for weeks to great success) perform much worse, very visibly so. A few hours later, Opus 4.5 was released.

Now I don't know what to think.

link

pankajdoharey 148 days ago

Model router.

link

pankajdoharey 148 days ago

Its the router they are using, we surely are not getting what we select. Also after a few queries the intelligence drops. abruptly. and doesn't recover even after we start a new session, so there is another internal quota at play.

link