Hacker News new | ask | show | jobs
Real-time LLM Inference on Standard GPUs (3k tokens/s per request) (blog.kog.ai)
7 points by morgangiraud 27 days ago