Hacker News new | ask | show | jobs
by markwkw 701 days ago
You trim, yes, but AI content surely invades (all?) areas of written material. People are increasingly using AI to assist their writing. Even it if's for slight editing, word choice suggestions.

Even AP doesn't ban the use of LLMs, its standards prohibit direct publishing of AI-generated content. I'm sure its writers leverage LLMs in some ways in their workflow, though. They would probably continue to use these even if AP attempted to ban LLMs (human incentives).

1 comments

If the AI generated content is filtered for quality or is corrected then it will still be good data. The phenomenon of model degradation is only in the case where there is no outside influence in the generated data.
I think this is extremely important with AI generated content, but seems to be given less and less thought as people start to "trust" AI as it seeps into the public conscious more. It needs to be reviewed, filtered, and fixed where appropriate. After that, it isn't any different from reviewing data on your own, and wording it in a way that fits the piece you're writing. Unfortunately, there's so much trust in AI now that people will go ahead and publish content without even reading it for the correct tense!
The same problem exists if you blindly trust any source without verifying it. There is a huge amount of endlessly recycled incorrect blog spam out there for all domains. Not only that but this problem has always existed for second hand information so it's not like we were even starting from some pristine state of perfect truthfulness. We have the tools we need to deal with the situation and they were developed hundreds of years ago. Empiricism being chief among them. Nullius in verba[0]

[0] https://en.wikipedia.org/wiki/Nullius_in_verba

If tail events aren't produced by these models, no amount of human filtering will get them back. People would not just need to filter or adjust AI generated content, but create novel content of their own.