Hacker News new | ask | show | jobs
by kelseyfrog 292 days ago
People over-update when they see an em-dash. If you compute the posterior probability, you'll realize that seeing an em-dash hardly shifts the probability that text is AI generated.
2 comments

You might even realize that the AI is using the em-dash precisely because of how often it was used in the training text.
That's a fair point. I'll admit that LLM em-dash usage likely matches its prevalence in the corpus. However, online message boards and chat is a subset of that corpus that may have a significantly different distribution of em-dash prevalence.

This is necessary nuance that I'll have take into consideration. Thank you.

Eh, I wasn't trying to put you on blast or anything. Obviously there is a lot of nuance to how/why the LLM uses them. I just think the whole "em dash == AI" thing is stupid and way overblown. I wish the people who say that would realize it obviously learned them from somewhere.
You're good, friend. I didn't feel blasted at all. It was a good point
“Over-update”? Do you mean “overreact”?
I'm guessing over-update their opinion/view, but that's the first I've run into that turn of phrase.
No, I mean over-update in a Bayesian capacity. As in, they are updating their prior probability using evidence to arrive at the posterior probability.
"I don't often jargon, but when I do jargon I jargon like Chuck Norris"
Huh?
See em dash in text

Think text much more likely from robot than first thought

Grug say this change too big from just one em dash