|
|
|
|
|
by a6kme
95 days ago
|
|
Hello. The latency is a factor of the models you are picking up for reasoning. If you are colocating the models by self hosting on GPUs, the latency can be as low as 500 - 600 ms between bot - user turns. With models like Gemini-2.5-flash, the latency is around 800-1000 ms. The latency can be higher with reasoning and larger models, like gpt-4.1. |
|