| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by ed 432 days ago
	Interesting direction for research but not a model you’d want to use today. The paper looks at a 3b model built on llama3.2-3b, modified for mamba, and they’re comparing to a distilled version of r1 with 1.5b params.