Hacker News new | ask | show | jobs
by the_fall 134 days ago
No. No one is looking for em-dashes, except for some bozos on the internet. The "default voice" of all mainstream LLMs can be easily detected by looking at the statistical distribution of word / token sequences. AI detector tools work and have very low false negatives. They have some small percentage of false positives because a small percentage of humans pick up the same writing habits, but that's not relevant here.

The "humanizer" filters will typically just use an LLM prompted to rewrite the text in another voice (which can be as simple as "you're a person in <profession X> from <region Y> who prefers to write tersely"), or specifically flag the problematic word sequences and ask an LLM to rephrase.

They most certainly don't improve the "correctness" and don't verify references, though.

1 comments

providers are also adding hidden characters and attempting to watermark if memory serves.
It's more complex than that. It's called SynthID-text and biases the probabilities of token generation in a way that can be recovered down the line.