Y
Hacker News
new
|
ask
|
show
|
jobs
by
cotran2
355 days ago
The model is compact 1.5B, most GPUs can serve it locally and has <100ms e2e latency. For L40s, its 50ms.