|
|
|
|
|
by danaris
510 days ago
|
|
This assumes no (or very small) diminishing returns effect. I don't pretend to know much about the minutiae of LLM training, but it wouldn't surprise me at all if throwing massively more GPUs at this particular training paradigm only produces marginal increases in output quality. |
|