As you mentioned, the data processing inequality[1] applies here, but I imagine synthetic data could help training squeeze out more from the existing data.
It's neat how a longer "digestive tract" loses entropy, but can make up for it by making more sense of things. It's akin to adding a NN layer, to a more computationally-intensive lossy compression algorithm, or to asking a LLM to explain the problem domain and the relevant variables (populating attention) before getting to the point.
It's probably true for people too. Instead of asking an expert for an opinion right away, ask them to discuss the options out loud first.
There are probably a lot of applications where the LLM could rely more on data that's supplied to it just-in-time in the context window, and less on specialist knowledge from its training set.
Also, "natural" data taken from the Internet is probably quite inefficient as training material. It's going to have a lot of duplication. You only need each fact once to be able to synthesize more examples of it.
It's probably true for people too. Instead of asking an expert for an opinion right away, ask them to discuss the options out loud first.