Y
Hacker News
new
|
ask
|
show
|
jobs
by
ac29
75 days ago
This article is about a MoE model with only 4B active parameters, it shouldn't take 10 minutes to answer a question about a small project.
I measured a 4bit quant of this model at 1300t/s prefill and ~60t/s decode on Ryzen 395+.