| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Layvier 426 days ago
	Agreed, it's not even possible to run an eval dataset. If someone from google see this please at least increase the burst rate limit

1 comments

punkpeye 426 days ago

It is not without rate limits, but we do have elevated limits for our accounts through:

https://glama.ai/models/gemini-2.5-flash-preview-04-17

So if you just want to run evals, that should do it.

Though the first couple of days after a model comes out are usually pretty rough because everyone try to run their evals.

link

punkpeye 426 days ago

What I am noticing with every new Gemini model that comes out is that the time to first token (TTFT) is not great. I guess it is because they gradually transfer computer power from old models to new models as the demand increases.

link

Filligree 426 days ago

If you’re imagining that 2.5Pro gets dynamically loaded during the time to first token, then you’re vastly overestimating what’s physically possible.

It’s more likely a latency-throughput tradeoff. Your query might get put inside a large batch, for example.

link

Layvier 426 days ago

That's very interesting, thanks for sharing!

link