| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by pona-a 488 days ago
	I tested it on basic long addition problems. It frequently misplaced the decimal signs, used unnecessary reasoning tokens (like restating previously done steps) and overall seemed only marginally more reliable than the base DeepSeek 1.5B. On my own pet eval, writing a fast Fibonacci algorithm in Scheme, it actually performed much worse. It took a much longer tangent before arriving at fast doubling algorithm, but then completely forgot how to even write S-expressions, proceeding to instead imagine Scheme uses a Python-like syntax while babbling about tail recursion.

2 comments

viraptor 488 days ago

> On my own pet eval, writing a fast Fibonacci algorithm in Scheme,

This model was trained on math problems datasets only, it seems. It makes sense that it's not any better at programming.

link

pona-a 488 days ago

The original model, aside from its programming mistakes, also misremembered the doubling formula. I hoped to see that solved, which it was, as well as maybe a more general performance boost from recovering some distillation loss.

link

ekidd 484 days ago

This model can't code at all.

It does high school math homework, plus maybe some easy physics. And it does them surprisingly well. Outside of that, it fails every test prompt in my set.

It's a pure specialist model.

link