|
|
|
|
|
by bitkin_dev
141 days ago
|
|
Great breakdown, thanks for writing this up. One thing I’m still unclear on: in real production workloads, what ended up being the main bottleneck first — memory bandwidth, KV cache management, or scheduler overhead? Curious how much of this showed up only under sustained load versus benchmarks. |
|