There's also the question of input data, though. Current large models have been trained on all the available human-created input. Trying to add more will lead to poisoning by AI-generated data and model collapse.
Model collapse is basically a myth and is a joke in the ML community. The assumptions for the model collapse paper do not hold in the real world even when training on uncrurated generated data. In fact, LLMs of equal size trained on newer web scrapes which include generated data have enhanced capabilities.
But in practice training data is curated and synthetic generated (curated) training data is even better than human data. State of the art LLMs like Phi;2 or the recent GPT-4 killer Claude 3 are trained entirely or mostly on generated data.
It's probable you can train the same system on the same data multiple times and still get an improvement. You could also train on universal sequence prediction data in between as well.
But in practice training data is curated and synthetic generated (curated) training data is even better than human data. State of the art LLMs like Phi;2 or the recent GPT-4 killer Claude 3 are trained entirely or mostly on generated data.