| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by kelseyfrog 292 days ago
	People over-update when they see an em-dash. If you compute the posterior probability, you'll realize that seeing an em-dash hardly shifts the probability that text is AI generated.

2 comments

anon84873628 292 days ago

You might even realize that the AI is using the em-dash precisely because of how often it was used in the training text.

link

kelseyfrog 292 days ago

That's a fair point. I'll admit that LLM em-dash usage likely matches its prevalence in the corpus. However, online message boards and chat is a subset of that corpus that may have a significantly different distribution of em-dash prevalence.

This is necessary nuance that I'll have take into consideration. Thank you.

link

anon84873628 292 days ago

Eh, I wasn't trying to put you on blast or anything. Obviously there is a lot of nuance to how/why the LLM uses them. I just think the whole "em dash == AI" thing is stupid and way overblown. I wish the people who say that would realize it obviously learned them from somewhere.

link

kelseyfrog 292 days ago

You're good, friend. I didn't feel blasted at all. It was a good point

link

pimlottc 292 days ago

“Over-update”? Do you mean “overreact”?

link

jdiff 292 days ago

I'm guessing over-update their opinion/view, but that's the first I've run into that turn of phrase.

link

kelseyfrog 292 days ago

No, I mean over-update in a Bayesian capacity. As in, they are updating their prior probability using evidence to arrive at the posterior probability.

link

ggm 292 days ago

"I don't often jargon, but when I do jargon I jargon like Chuck Norris"

link

firesteelrain 292 days ago

Huh?

link

cthor 292 days ago

See em dash in text

Think text much more likely from robot than first thought

Grug say this change too big from just one em dash

link