| HN Mirror

Maybe some of them already embed some simple, secret marker to identify their own generated content. But people outside the organization wouldn’t know. And this still can’t prevent other companies from training models on synthetic data.

Once synthetic data becomes pervasive, it’s inevitable that some of it will end up in the training process. Then it’ll be interesting to see how the information world evolves: AI-generated content built on synthetic data produced by other AIs. Over time, people may trust AI-generated content less and less.