Y
Hacker News
new
|
ask
|
show
|
jobs
by
am17an
90 days ago
You can still run larger MoE models using expert weight off-loading to the CPU for token generation. They are by and large useable, I get ~50 toks/second on a kimi linear 48B (3B active) model on a potato PC + a 3090