Hacker News new | ask | show | jobs
by danaris 510 days ago
This assumes no (or very small) diminishing returns effect.

I don't pretend to know much about the minutiae of LLM training, but it wouldn't surprise me at all if throwing massively more GPUs at this particular training paradigm only produces marginal increases in output quality.

1 comments

I believe the margin to expand is on CoT, where tokens can grow dramatically. If there is value in putting more compute towards it, there may still be returns to be captured on that margin.