|
|
|
|
|
by Filligree
426 days ago
|
|
If you’re imagining that 2.5Pro gets dynamically loaded during the time to first token, then you’re vastly overestimating what’s physically possible. It’s more likely a latency-throughput tradeoff. Your query might get put inside a large batch, for example. |
|