If AI is meant to sound nearly identical to a human, you don't need more training data.
If its meant to act as a natural language encyclopedia, we'll never get there with LLMs which amount to natural language processing on top of a massively compressed dataset.
Trying to make AIs more factually-accurate with more training is probably hopeless. Current events and encyclopedic knowledge will be provided by tools. The LLM's core job is to choose the right tools for the job and synthesize their outputs.
If AI is meant to sound nearly identical to a human, you don't need more training data.
If its meant to act as a natural language encyclopedia, we'll never get there with LLMs which amount to natural language processing on top of a massively compressed dataset.