Hacker News new | ask | show | jobs
by logicchains 773 days ago
So many papers play tricks with the learning rate schedule: https://arxiv.org/abs/2307.06440