Hacker News new | ask | show | jobs
by yawnxyz 27 days ago
self-hosting is fine, but even if you had a $100k god box with opus-level LLMs, you'd still end up grinding it to a halt if you tried running 5-10 parallel inference streams