| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by danlenton 798 days ago
	So the neural scoring introduces ~20ms latency, but this only impacts time-to-first-token (not inter-token-latency). When using our public endpoints there is an additional ~150ms latency, but you can deploy the router on-prem in your own cloud, so then it would only be the inference latency. Generally the improvements in ITL outweigh the small addition to TTFT.