Hacker News new | ask | show | jobs
by SchemaLoad 346 days ago
I've been wondering if ChatGPT makes such excessive use of EM dash just so people can easily identify AI generated content.

Google wouldn't even need a fingerprint, they could just look up from their logs who generated the video.

2 comments

Google already admitted they are fingerprinting generative video and have a safety obsession so I guarantee they do it to their LLMs. Another reason is to pollute the output that folks like Deepseek are using to train derivative models.
The em-dash is one marker, but I’ve read that most LLMs create small but statistically detectable biases in their output to help them avoid reingesting their own content.