Y
Hacker News
new
|
ask
|
show
|
jobs
Real-time LLM Inference on Standard GPUs (3k tokens/s per request)
(
blog.kog.ai
)
7 points
by
morgangiraud
27 days ago