Hacker News new | ask | show | jobs
by merizian 543 days ago
Because of mup [0] and scaling laws, you can test ideas empirically on smaller models, with some confidence they will transfer to the larger model.

[0] https://arxiv.org/abs/2203.03466