|
|
|
|
|
by londons_explore
1262 days ago
|
|
Slower training tends to be only a little cheaper, because most modern architectures parallelize well, and they just care about the number of flops. If you want to reduce cost, you need to reduce the model size, and you'll get worse results for less money. |
|