Hacker News new | ask | show | jobs
by qsera 52 days ago
That is like saying we can get unlimited data compression by feeding the output of a data compressing program into its own input..
1 comments

Not it's not like saying that at all.
Yes, that is exactly like it.

Generating new training data from existing data will only generate patterns that already exist in the training data. It might help LLMs to capture it if has not already. But it can never generate novel patterns.

For example, Imagine some kind of neural network architecture that can do OCR. You might be able to generate variations of the letters it already know using some technique and use it to better train the recognition of already known letters.

But it would never be able to generate letters that it does not know.