|
|
|
|
|
by snek_case
95 days ago
|
|
Probably constrained by training resources. It's much easier to experiment with a smaller architecture. You may need many training runs to figure out hyperparameters for example. If each run needs multiple GPUs for a week the cost adds up quickly. I think it makes a lot of sense to start small. |
|