|
|
|
|
|
by jakestevens2
95 days ago
|
|
Since you're using GH200s for these optimizations you're restricted to single device workloads (since GH series are SOC architecture). Kimi K2 (and many other large MoE models) requires multiple devices. Does that mean you can't scale these optimizations to multi-device workloads? |
|