| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by irthomasthomas 317 days ago
	So sonnet-4 is faster than gemini-2.5-flash at long context. That is surprising. Especially since Gemini runs on those fast TPUS.

5 comments

curl-up 317 days ago

Note that (in the first test, the only one where output length is reported), Gemini Pro returned more than 3x the amount of text, at less than 2x the amount of time. From my experience with Gemini, that time was probably mainly spent on thinking, length of which is not reported here. So looking at pure TPS of output, Gemini is faster, but without clear info on the thinking time/length, it's impossible to judge.

link

jbellis 317 days ago

if they left them both on defaults, flash is thinking-by-default and sonnet 4 is no-thinking-by-default

link

bitpush 317 days ago

> Claude’s overall response was consistently around 500 words—Flash and Pro delivered 3,372 and 1,591 words by contrast.

It isnt clear from the article whether the time they quote is time-to-first-token or time to completion. If it is latter, then it makes sense why gemini* would take longer even with similar token throughput.

link

lugao 317 days ago

Anthropic also uses TPUs for inference.

link

irthomasthomas 317 days ago

Do they rent them from Google? Or are they a different brand?

link

ancientworldnow 316 days ago

Google provides them.

link

irthomasthomas 316 days ago

Ah cool I'll have to read up on that, I had thought that google was hoarding them.

link

netdur 316 days ago

output tokens must be generated in order (autoregressive decoding), inputs don’t have that constraint, so prefill is parallel, with stronger kernels, KV-cache handling, and batching, Claude can outrun Gemini.

link