|
|
|
|
|
by yorwba
453 days ago
|
|
Run a grid of hyperparameters for small models of different sizes to find out how the optimal values change as you scale up (the "scaling laws"), then extrapolate to predict performance at even larger scales, then do a single large run and hope that your predictions aren't too far off. |
|