Hacker News new | ask | show | jobs
by pona-a 488 days ago
The original model, aside from its programming mistakes, also misremembered the doubling formula. I hoped to see that solved, which it was, as well as maybe a more general performance boost from recovering some distillation loss.