Hacker News new | ask | show | jobs
by irthomasthomas 317 days ago
So sonnet-4 is faster than gemini-2.5-flash at long context. That is surprising. Especially since Gemini runs on those fast TPUS.
5 comments

Note that (in the first test, the only one where output length is reported), Gemini Pro returned more than 3x the amount of text, at less than 2x the amount of time. From my experience with Gemini, that time was probably mainly spent on thinking, length of which is not reported here. So looking at pure TPS of output, Gemini is faster, but without clear info on the thinking time/length, it's impossible to judge.
if they left them both on defaults, flash is thinking-by-default and sonnet 4 is no-thinking-by-default
> Claude’s overall response was consistently around 500 words—Flash and Pro delivered 3,372 and 1,591 words by contrast.

It isnt clear from the article whether the time they quote is time-to-first-token or time to completion. If it is latter, then it makes sense why gemini* would take longer even with similar token throughput.

Anthropic also uses TPUs for inference.
Do they rent them from Google? Or are they a different brand?
Google provides them.
Ah cool I'll have to read up on that, I had thought that google was hoarding them.
output tokens must be generated in order (autoregressive decoding), inputs don’t have that constraint, so prefill is parallel, with stronger kernels, KV-cache handling, and batching, Claude can outrun Gemini.