| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by punkpeye 428 days ago

It is not without rate limits, but we do have elevated limits for our accounts through:

https://glama.ai/models/gemini-2.5-flash-preview-04-17

So if you just want to run evals, that should do it.

Though the first couple of days after a model comes out are usually pretty rough because everyone try to run their evals.

2 comments

punkpeye 428 days ago

What I am noticing with every new Gemini model that comes out is that the time to first token (TTFT) is not great. I guess it is because they gradually transfer computer power from old models to new models as the demand increases.

link

Filligree 428 days ago

If you’re imagining that 2.5Pro gets dynamically loaded during the time to first token, then you’re vastly overestimating what’s physically possible.

It’s more likely a latency-throughput tradeoff. Your query might get put inside a large batch, for example.

link

Layvier 428 days ago

That's very interesting, thanks for sharing!

link