Hacker News new | ask | show | jobs
by Layvier 426 days ago
Agreed, it's not even possible to run an eval dataset. If someone from google see this please at least increase the burst rate limit
1 comments

It is not without rate limits, but we do have elevated limits for our accounts through:

https://glama.ai/models/gemini-2.5-flash-preview-04-17

So if you just want to run evals, that should do it.

Though the first couple of days after a model comes out are usually pretty rough because everyone try to run their evals.

What I am noticing with every new Gemini model that comes out is that the time to first token (TTFT) is not great. I guess it is because they gradually transfer computer power from old models to new models as the demand increases.
If you’re imagining that 2.5Pro gets dynamically loaded during the time to first token, then you’re vastly overestimating what’s physically possible.

It’s more likely a latency-throughput tradeoff. Your query might get put inside a large batch, for example.

That's very interesting, thanks for sharing!