Hacker News new | ask | show | jobs
by LegionMammal978 561 days ago
Yep, that's what I've been thinking since people started talking about it. I hear that AI plagiarism detectors can never work, since LLM output can never be detected with any accuracy. Yet I also hear that LLMs-in-training easily sift out any generated content from their input data, so that recursion is a non-issue. It doesn't make much sense to have it both ways.
1 comments

I wonder if the truth about sifting out synthetic training data is based on signals separate from the content itself. Signals such as the source of the data, reported author, links to/from etc.

These signals would be unavailable to a plagiarism/ai detector