Y
Hacker News
new
|
ask
|
show
|
jobs
by
samus
806 days ago
You're only correct about Qwen's MoE. I presume that Chinese model builders feel more pressure to be efficient about using their GPU time because of sanctions.