Hacker News new | ask | show | jobs
by Baader-Meinhof 26 days ago
I like that these AI idioms exist. They're like watermarks for text. It's worth the cost of humans avoiding them. Companies will eventually train their models to be undetectable, but society would be better if they didn't.
11 comments

Humans are just trying to do what Pangram is trying to do: guess what is AI, badly. The post argues against this:

> In the end, shaming people for writing that gets flagged as AI can lead people to sidestep structures the model has learned from us: structures that are effective tools for argumentation. We take the tools of critical thinking out of the kit at the time we most need them.

This is my position with this stuff. It became part of the LLM loop because it’s used a lot- it’s used a lot because it’s effective.

Now we’re going to stop using effective rhetorical methods because they imply AI, even if we know we’re not using AI?

It reminds me of, as a teenager, asking my dad if he ever saw Led Zeppelin live. He hadn’t, because he didn’t really like fans of Led Zeppelin and didn’t want to be associated with them.

As an ashamed fan of certain bands I get this instinct but I also promised to myself when I heard this that I would do my best to not allow other people to influence how I thought about things I enjoyed.

On the same note I’m trying to be “braver” about things like em-dashes, though my personally style has always been to use them as I did in this comment- like this, which I guess distinguishes me, until an LLM picks that up too…

An em dash looks like this

You're not using that, neither in the past from what I can tell, nor in this comment.

You're just using a hyphen/minus instead of a colon, that's not an llm-ism

Actually, in an ancient and venerable markup language that's still in wide use in certain not-unimportant communities:

- = hyphen

-- = n-dash

--- = m-dash

You may notice that he didn't use the double or triple hyphen annotations either - which is usually only used in contexts such as latex, where a post-processor goes over the output for display.
I like to use a lazy variant -- it's not a double dash, or a weirdly written plus, it's a an em-dash that says "I don't even have this key on my keyboard, are you actually using alt-codes or what am I missing?". Not with a shout or a whisper, but with the quiet courage of just being -- but not an incomplete representation of a whole, but rather the fullness of that very distillation of honest, simple pragmatism. Not less, just different.

The above isn't slop. It's shit though!

In fact, I'd say it's a dead giveaway for "human impersonating AI impersonating humans". Using the hyphen as an em dash screams
I don’t think it’s quite the same though. The way it constructs thoughts is very algorithmic. If you look at the Wikipedia ai text doc it’s a much better explanation and arguments for not immediately blaming someone for using ai.

https://en.wikipedia.org/wiki/Wikipedia:Signs_of_AI_writing

> It's worth the cost of humans avoiding them

That's really unfortunate though. It's like Michael Bolton from Office Space: "No way! Why should I change? He's the one who sucks."

Telling humans to change how they write just so they won’t be accused of using AI is the most anti-human pro-AI idea imaginable.
AGI Plan to end humanity, act I: communicate so well they will start to communicate horribly, setting back their collective IQ by thousands of years.
Could just as well be AGI's plan to save humanity.
Just be me and include enough grammar and spelling errors that nobody ever confuses you for being AI ;)
Counterpoint: I think it can also useful to avoid LLM-isms because it's a quick test to check whether you're saying something derivative or actually saying something novel/interesting/significant. Which is to say, if someone could credibly accuse me of being an LLM, then that means my writing is no better (for whatever definition of "better" you want to use) than what happens when you melt down all of human language into a paste and then reconstitute it into featureless little cubes.

Obviously there are exceptions; you can use certain constructions in a way that's still unmistakably human, or use them within a larger context of unmistakably human writing. But in general it makes me think about Orwell's argument against cliches:

> A newly invented metaphor assists thought by evoking a visual image, while on the other hand a metaphor which is technically ‘dead’ (e. g. iron resolution) has in effect reverted to being an ordinary word and can generally be used without loss of vividness. But in between these two classes there is a huge dump of worn-out metaphors which have lost all evocative power and are merely used because they save people the trouble of inventing phrases for themselves.

If LLM-isms give readers the impression that I'm too lazy to phrase things in my own words, even if I did in fact phrase things in my own words, then I take that as a sign that I should pick better words!

Granted, I've had a strong desire to write as distinctly and un-cliche-ish-ly as possible since long before ChatGPT's public launch, so I might not be as grumbly as other commenters who feel like this would force them to change how they write.

A quick and broken test.
I don't think so at all. Models are trained in many ways and are changing aggressively, resulting in different patterns in different regions, domains, languages, and will be different 3, 5, 10 years down the line. Having everyone try to learn and adapt around how to stay within very magical, fuzzy, and ever-changing boundaries to avoid appearing to be an AI, instead of focusing on producing good writing or communicating as it is natural to them, seems like a recipe for bad thinking and arbitrary reactions.
> will be different 3, 5, 10 years down the line.

Possibly, but it's not a necessity. Click baiting (i.e. yt videos) has evolved to stable standards, that's at least my impression.

Also, products often only get improved until they are "good enough", not until they are "good". It happens, but then they just iterate towards the "how bad can I become" baseline from the other side.

AI companies generally are not in the "let's make the best AI possible" business but in the "let's make the most money" business. This just hasn't fully manifest because they get flooded with VC.

It's like knowing to stay away from a Github repo because it has a readme that's full of emoji bullet points.
I actually find the "AI idioms" rather less grating than emoji-vomit. That said, I don't know why certain LLM output seems to be full of the latter; certainly no real human writing I've seen has that style, but perhaps it's a result of training on data that probably should've been done without.
Patrick Wardle, the guy behind Objective-See, had that style in the 2010s when I first started following his work. I actually liked it at the time.
I thought I was the only one that did this. Double stay away if at the end you find out it was "made with love"
I just look for the CLAUDE.md or related.
Except that the entire point of the article is that they're not AI idioms. They're not "watermarks for text." They're legitimate language constructions that LLMs tend to overuse, but that real humans also use. Real humans do, in fact, say "align with" all the time, just as often as "corresponds."

And you can pry my em dashes from my cold, dead hands.

What's worse is neurodivergent writing, including my own, often resemble AI output. Now it feels like I'm having to alter my own voice in online discussions just to specifically avoid being accused of pasting an AI response.

The "AI Detection" tools employed by schools also regularly flag writing from those with Autism, ADHD, and non-native English speakers as being AI generated as well.

So, naturally, I can't stand the phrase "write like AI" when these things tend to come up because no, there are no humans that "write like AI" it's the models that have stolen the literary devices from us and now have poisoned them.

It flags "non-native speakers" text as AI generated? Really? This beggars belief.
https://arxiv.org/abs/2304.02819

> While the detectors were “near-perfect” in evaluating essays written by U.S.-born eighth-graders, they classified more than half of TOEFL essays (61.22%) written by non-native English students as AI-generated (TOEFL is an acronym for the Test of English as a Foreign Language).

> It gets worse. According to the study, all seven AI detectors unanimously identified 18 of the 91 TOEFL student essays (19%) as AI-generated and a remarkable 89 of the 91 TOEFL essays (97%) were flagged by at least one of the detectors.

That is an empirical question. Do you have empirical sources you'd care to share?
The article is not God, just because it claims something doesn't mean we have to accept it.

For better or worse (and pretty much for worse), these usages have become AI idioms. Language evolves over time, things that used to be harmless become offensive, certain terms end up taking on the complete opposite meaning than their original meaning, and we are watching certain language patterns and idioms become watermarks for AI and while it sucks, it doesn't make it false.

I'll just quote from the article, which no one claimed was God and that's really a weird way to dismiss it, but you do you:

"We create a culture of self-censorship and AI-detector-pressured rewriting and paraphrasing as people strive to avoid these witch hunts. That is the opposite of protecting human expression. We should resist normalizing a trust in any machine's ability to determine matters of guilt. If using AI to write is, at its worst, an industrialization of the mind, then AI detection, at its worst, becomes a surveillance system for thought."

And, I'm sorry (I'm not), but I am not going to just roll over and shrug and say "welp, guess we all need to dumb our writing down to keep well-meaning idiots from screeching 'AI! AI! AI! WHOOP! WHOOP WHOOP WHOOP!' at us." That isn't the evolution of language. It's Idiocracy.

Well reading between the lines I don’t think they’re saying all of those uses are AI. They’re legitimate constructs, like the em-dash, en-dash, and hyphen, all of which I used to use regularly. But now they’re AI tells so I use them sparingly.
Sociolinguistic register happened.
Once upon a time, using em dashes—which hardly anyone knew how to conveniently invoke—was a fun writing quirk to have.

Now I'll have to find something else to overuse: maybe sentences structures around colons, or use of Japanese 「hook brackets」.

Nah pry my lists of 3 from my cold dead hands. And my emdashes sometime after that.

It's not X, it's Y, though? Couldn't be me.

It's a useful construction. "It's not true love, you matched with her on Hinge last week and have never met her, please don't send her $1000 in Apple gift cards" is punchy.
I think the real tell is when Y~=X. It's just performative. Like genuinely formative (the other tell is real/actual/genuine over a weak claim).
i was just yesterday yelling at Gemini for telling me, five times in a row, that it "had found the absolute truth of the problem" when it was wrong all 5 times lol
> It's worth the cost of humans avoiding them.

No, fuck that. I'm not going to think twice about what I write just to avoid an AI checker, and I will delve into em dashes with gusto if that's what the writing calls for.

I'm not sacrificing the language simply to sound less like AI--that's absolutely a losing game.

And if anyone thinks my hand-crafted prose is AI-generated, they're free to look elsewhere. Right now AI detectors peg my pre-AI work as 30% AI-generated, and I'm certain that number will only increase as LLMs improve.

> I'm not going to think twice about what I write just to avoid an AI checker

It depends on your environment, I guess. If you're a student writing an essay or a researcher writing a paper, it's in your best interest to avoid sounding like an LLM, which means going out of your way to avoid certain idioms, even if it means letting go of things you liked to write.

I used to love a spaced en dash (the British English equivalent of the American unspaced em dash), but I wouldn't risk it now.

Eventually, though, you won't be able to avoid it.
You're right, of course. But today, it's not a risk worth taking, at least not for professional writing where the suspicion of LLM writing can be damning.
I utterly detest the idea of having AI potentially lock me out of my own writing style.
I agree with the feeling. But if you agree with the analysis of the article, this cat & mouse game ultimately amounts to stop disclosing our reasoning threads through commonly accepted linguistic structures. That's quite a price to pay as a society...
Clean. This comment is the right shape.