|
|
|
|
|
by punkpeye
426 days ago
|
|
What I am noticing with every new Gemini model that comes out is that the time to first token (TTFT) is not great. I guess it is because they gradually transfer computer power from old models to new models as the demand increases. |
|
It’s more likely a latency-throughput tradeoff. Your query might get put inside a large batch, for example.