People over-update when they see an em-dash. If you compute the posterior probability, you'll realize that seeing an em-dash hardly shifts the probability that text is AI generated.
That's a fair point. I'll admit that LLM em-dash usage likely matches its prevalence in the corpus. However, online message boards and chat is a subset of that corpus that may have a significantly different distribution of em-dash prevalence.
This is necessary nuance that I'll have take into consideration. Thank you.
Eh, I wasn't trying to put you on blast or anything. Obviously there is a lot of nuance to how/why the LLM uses them. I just think the whole "em dash == AI" thing is stupid and way overblown. I wish the people who say that would realize it obviously learned them from somewhere.