Hacker News new | ask | show | jobs
by samus 806 days ago
You're only correct about Qwen's MoE. I presume that Chinese model builders feel more pressure to be efficient about using their GPU time because of sanctions.