Note that (in the first test, the only one where output length is reported), Gemini Pro returned more than 3x the amount of text, at less than 2x the amount of time. From my experience with Gemini, that time was probably mainly spent on thinking, length of which is not reported here. So looking at pure TPS of output, Gemini is faster, but without clear info on the thinking time/length, it's impossible to judge.
> Claude’s overall response was consistently around 500 words—Flash and Pro delivered 3,372 and 1,591 words by contrast.
It isnt clear from the article whether the time they quote is time-to-first-token or time to completion. If it is latter, then it makes sense why gemini* would take longer even with similar token throughput.
output tokens must be generated in order (autoregressive decoding), inputs don’t have that constraint, so prefill is parallel, with stronger kernels, KV-cache handling, and batching, Claude can outrun Gemini.