Hacker News new | ask | show | jobs
by cootsnuck 300 days ago
Super helpful to see actual examples of what it (roughly) can look like to deploy production inference workloads, and also the latest optimization efforts.

I consult in this space and clients still don't fully understand how complex it can get to just "run your own LLM".