Y
Hacker News
new
|
ask
|
show
|
jobs
by
ed
432 days ago
Interesting direction for research but not a model you’d want to use today. The paper looks at a 3b model built on llama3.2-3b, modified for mamba, and they’re comparing to a distilled version of r1 with 1.5b params.