Hacker News new | ask | show | jobs
by samhoss93 12 days ago
Great README. Genuinely one of the clearest walkthrough of inference internals. The KV cache section is worth lingering one as most of the OOM and throughput issues trace back to this and normally difficult to reason about. sequence length and batch size fill the cache in a way that show up under real traffic.

look forward to going over the completed course.