Hacker News new | ask | show | jobs
by jakestevens2 95 days ago
Since you're using GH200s for these optimizations you're restricted to single device workloads (since GH series are SOC architecture). Kimi K2 (and many other large MoE models) requires multiple devices. Does that mean you can't scale these optimizations to multi-device workloads?
1 comments

Hey Jack, we use GB200s for these workloads. Feel free to check those big models out on our site! We are doing Kimi, GLM, Minimax, etc.
Nice! But that doesn’t answer the question. Do these optimizations don’t scale to multi-device workloads or not?