Hacker News new | ask | show | jobs
A Batch Size and Token NUM- BER Agnostic Learning Rate Scheduler (arxiv.org)
2 points by veryluckyxyz 389 days ago