| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by a6kme 141 days ago
	Hello. The latency is a factor of the models you are picking up for reasoning. If you are colocating the models by self hosting on GPUs, the latency can be as low as 500 - 600 ms between bot - user turns. With models like Gemini-2.5-flash, the latency is around 800-1000 ms. The latency can be higher with reasoning and larger models, like gpt-4.1.