Hacker News new | ask | show | jobs
by punkpeye 428 days ago
It is not without rate limits, but we do have elevated limits for our accounts through:

https://glama.ai/models/gemini-2.5-flash-preview-04-17

So if you just want to run evals, that should do it.

Though the first couple of days after a model comes out are usually pretty rough because everyone try to run their evals.

2 comments

What I am noticing with every new Gemini model that comes out is that the time to first token (TTFT) is not great. I guess it is because they gradually transfer computer power from old models to new models as the demand increases.
If you’re imagining that 2.5Pro gets dynamically loaded during the time to first token, then you’re vastly overestimating what’s physically possible.

It’s more likely a latency-throughput tradeoff. Your query might get put inside a large batch, for example.

That's very interesting, thanks for sharing!