Hacker News new | ask | show | jobs
by whiplash451 184 days ago
Not just hyper parameter tweaking. Not foundational research either. But rather engineering improvements that compound with each other (conswiglu layers, muon optimizer)