| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by cotran2 355 days ago
	The model is compact 1.5B, most GPUs can serve it locally and has <100ms e2e latency. For L40s, its 50ms.