Hacker News new | ask | show | jobs
by buu700 294 days ago
I wouldn't worry about it. A year ago the red flag du jour was "delve"; this year it's em dashes; next year it'll be something else. In any case, this is a very online topic that I assume only a vocal minority are hung up on in the first place. If you picked a random person off the street and asked for their thoughts on em dashes, you'd probably get a blank stare.

Eventually, as models and their users both improve, we'll collectively realize that trying to reliably discriminate between AI and human writing is no different than reading tea leaves. We should judge content based on its intrinsic value, not its provenance. We should call each other out for poor writing or inaccurate information — not because if we squint we can pick out some loose correlations with ChatGPT's default output style.

Consciously trying not to "sound like an LLM" while writing is like consciously trying not to think about the fact that you're currently breathing, or consciously trying to sound like a cool guy.

9 comments

100%. I use em-dashes a decent amount and plan to continue. If someone wants to incorrectly assume it was AI writing so be it.
I use them occasionally and have never been falsely accused of being an LLM.

The stakes are a bit different for students unfortunately, who who’ll have their writing passed through some snake oil AI detector arbitrarily. This is unfortunate because “learning how not to trigger an AI detector” is a totally useless skill.

Generally, I don’t think we need AI detection. We need dumb bullshit detection. Humans and LLMs can both generate that. If people can use an LLM in a way that doesn’t generate dumb bullshit, I’m happy to read it.

I think this is a passing phase - academia and the education system will have to adapt to the fact that LLMs exist and will be used, and that therefore the essay is no longer a useful artifact as evidence of learning. This is probably a good thing in the long run.
Essays are still useful — they just need to be written by hand in exam conditions. No more take-homes.
yeah but writing an essay over the course of a week and over the course of two hours are entirely different experiences -- and the first one is the one that's usually useful in post-graduate life
Same. Years ago I took the time to learn the difference between an em-dash, an en-dash, and a hyphen and I'll continue to use them regardless of what AI does.

I don't use AI in my writing. If I were still in school would I be tempted? Probably. But in work and personal writing? Never crosses my mind.

For me, learning the difference between em-dash, en-dash, and hyphen, and what each of them was supposed to be used for was a side-effect of learning LaTeX.
I agree completely with this as a human reader - but do wonder about the gradual codification of these markers in systems that will have increasingly have LLM detection as a standard feature, as frequently and obviously enabled as spam detectors were on blog comments back when blogs had comments.
Certainly! I’m right there with you.
>If you picked a random person off the street and asked for their thoughts on em dashes, you'd probably get a blank stare.

But the people you will reach online will be online, and not some random person-off-the-street. The average person on the street will give the same blank stare on the topic of compilers, regular expressions, black-holes, or robotics, but I still want to read about those topics. And if I want an LLM's take on those topics, everyone knows where to turn to get that.

Approximately everyone is online. What percentage of those people do you think are even familiar with the meme that em dashes == LLM assistance, much less feel strongly enough to complain or attack you over your punctuation choices?
seen it commonly on many large / front page reddit threads.
But because the average person on the street would give blank stares about compilers, we can conclude that compilers aren't important /s
> Eventually, as models and their users both improve, we'll collectively realize that trying to reliably discriminate between AI and human writing is no different than reading tea leaves. We should judge content based on its intrinsic value, not its provenance.

There are zillions of words produced every second, your time is the most valuable resource you have, and actually existing LLM output (as opposed to some theoretical perfect future) is almost always not worth reading. Like it or not (and personally I hate it), the ability to dismiss things that are not worth reading like a chicken sexer who's picked up a male is now one of the most valuable life skills.

Putting aside the claim that "LLM output [...] is almost always not worth reading"[1], the whole issue here is that this supposed ability of determining whether or not content is AI-generated doesn't exist. Is it really a valuable life skill to decide whether or not you want to read something based solely on its density of em dashes?

Of course there are cases where you can tell that some text is almost certainly LLM output, because it matches what ChatGPT might reply with to a basic prompt. You can also tell when a piece of writing is copied and pasted from Wikipedia, or a copy of a page of Google results. Would any of that somehow be more worth reading if the author posted a video of themselves carefully typing it up by hand?

1: You're assuming a specific type of output in a specific type of context. If LLM output were never worth reading, ChatGPT would have no users.

> Is it really a valuable life skill to decide whether or not you want to read something based solely on its density of em dashes?

Having good heuristics to make quick judgements is a valuable life skill. If you don't, you're going to get swamped.

> Would any of that somehow be more worth reading if the author posted a video of themselves carefully typing it up by hand?

No, but the volume of carefully hand-typed junk is more manageable. Compare with spam: Individually written marketing emails might be just as worthless as machine-generated mass mailings, but the latter is what's going to fill up your inbox if you can't filter it out.

> If LLM output were never worth reading, ChatGPT would have no users.

Only if all potential users were wise. Plenty of people waste their time and money in all sorts of ways.

Having good heuristics is a valuable life skill. Presence of standard punctuation is not a good heuristic.
If it's stupid and it works, it's not stupid.
If it's stupid and it works once, chances are it's stupid and you just got lucky.
This seems like begging the question to me.

Why do you think it's not a good heuristic to be able to quickly spot the tell-tale signs of LLM involvement, before you've wasted time reading slop?

Yes, there will be false positives. It's a heuristic after all.

Because the false positive rate is unacceptably high — we're talking about a standard, widely used character — and because if the heuristic becomes widespread enough to matter, then it will be trivially circumvented by bad actors anyway. Who is it helping if we collectively bully ourselves into excising a perfectly good punctuation mark from human language?

If anything, I'd rather that renderers like Markdown just all agree to change " - " to an en dash and " -- " to an em dash. Then we could put the matter to bed once and for all.

Honestly it is currently _good enough_. It's obviously imperfect, like ~all heuristics, but its false positive rate should be fairly low.
"Should be fairly low" isn't a safe assumption without robust data to back that up. I think it's more likely to be unacceptably high. Dashes are standard punctuation marks available through Android/iOS/macOS keyboards, and automatically inserted into text by common tools like Microsoft Word — not some obscure Unicode character. What's next, are we going to start flagging any text that ends in a question mark as "AI-generated"?
> 1: You're assuming a specific type of output in a specific type of context. If LLM output were never worth reading, ChatGPT would have no users.

I think nobody is upset about reading an LLM's output when they are directly interacting with a tool that produces such output, such as ChatGPT or Copilot.

The problem is when they are reading/watching stuff in the wild and it suddenly becomes clear it was generated by AI rather than by another human being. Again, not in a context of "this pull request contains code generated by an LLM" (expected) but "this article or book was partly or completely generated by an LLM" (unexpected and likely unwanted).

Right, that's part of what I'm getting at. There are two primary cases when LLM output tends to be bad:

1. In the context of research/querying, when unverified information from its output is falsely passed off as verified information curated by a human author. There's a big difference between "ChatGPT or some blog claims X" and "the answer is X".

2. In the context of writing/communication, when it's used to stretch a small amount of information into a relatively large amount of text. There's a big difference between using an LLM to help revise or trim down your writing, or to have it put together a first draft based on a list of detailed bullet points, and expecting it to stretch one sentence into a whole essay of greater value than the original sentence.

Those are basic misuses of the tool. It's like watching an old person try to use Google 20 years ago and concluding that search engines are slop and the only reliable way to find information is through the index of Encyclopedia Britannica.

> this supposed ability of determining whether or not content is AI-generated doesn't exist.

It seems like you’re just wrong here? Em dashes aside, the ‘style’ of llm generated text is pretty distinct, and is something many people are able to distinguish.

No, I'm not wrong. Someone could easily write in the default output style of ChatGPT by hand (which will probably become increasingly common the longer that style remains in place), and someone could easily collaborate with ChatGPT on writing that looks nothing like what you're thinking.

If organizations like schools are going to rely on tools that claim to detect AI-generated text with a useful level of reliability, they better have zero false positives. But of course they can't, because unless the tool involves time travel that isn't possible. At best, such tools can detect non-ASCII punctuation marks and overly cliched/formulaic writing, neither of which is academic dishonesty.

Okay, you’re right that the LLM writing style isn’t singularly producible by LLM’s. However, I’m not sure why this writing style would become increasingly common? I don’t see why people would mimic text that is seen as low quality or associated with academic dishonesty.

Additionally, I do think it is valuable to determine if a piece of text is valuable, or more precisely, what I’m looking for. As others have said, if I want info from a LLM about a subject, it is trivial for me to get that. Oftentimes I am looking for text written by people though.

However, I’m not sure why this writing style would become increasingly common?

I was basing that on a few factors, off the top of my head:

1. Someone might pick up mannerisms while using LLMs to help learn a new language, similarly to how an old friend of mine from Germany spoke English with an Australian accent because of where she learned English.

2. Lonely or asocial people who spend too much time with LLMs might subconsciously pick up habits from them.

3. Generation Beta will never have known a world without LLMs. It's not that difficult to imagine that ChatGPT will be a major formative influence on many of them.

As others have said, if I want info from a LLM about a subject, it is trivial for me to get that.

Sure, it's trivial for anyone to look up a simple fact. It's not so trivial for you to spend an hour deep-diving into a subject with an LLM and manually fact-checking information it provides before eventually landing on an LLM-generated blurb that provides exactly the information you were looking for. It's also not trivial for you to reproduce the list of detailed hand-written bullet points that someone might have provided as source material for an LLM to generate a first draft.

Provenance matters because LLM writing is cheap compared to actually having to think about what to say.

I only have a limited amount of time to read. Skipping someone's Internet comment because it looks like spam often means I get to engage with something else.

I don't see that provenance matters per se. LLM-assisted writing is comparatively cheaper than producing the same writing without an LLM, but not inherently cheap in absolute terms.

If someone who typically bills $500/hr spends 30 - 60 minutes on a comment or blog post, that's still $250 - 500 worth of their time invested regardless of whether or not an LLM was involved. An LLM is comparatively cheaper than hiring a human editor or research assistant, but it's not negative cost.

Likewise, prompting ChatGPT with "write a blog post about bees" may be cheaper than hiring someone off Fiverr to respond to the exact same prompt, but in either case the resulting content will be low-value (yet still higher-value than the string "write a blog post about bees") because its source material was cheap. The fact that the latter version would have been written by a human is incidental.

>A year ago the red flag du jour was "delve"

"delve" was a red flag 650 years ago!

When Adam delved and Eve span, who was then the gentleman? — Fr John Ball's sermon addressing the rebels of the Peasant's Revolt, 1381

Always wondered a bit about that one; both mining and spinning seems weirdly high-tech for Adam and Eve.
Your opinion is not necessarily one I agree or disagree with, but I feel that you've dismissed the entire article due to one part of it.

I think there is a very interesting discussion to be had over how LLMs are actively changing the way we write, or even speak.

Just to clarify, I didn't dismiss the article. I'm agreeing with one of the author's points that we shouldn't dumb down our writing to please a vocal minority.
It’s not even just em dashes, it’s the same style posts people make that match with how chatgpt talks with simple prompts.
Right, but that's bad writing because it's awkward to read and/or overly cliched. The fact that it may have been AI-generated is incidental. It would still be bad writing if someone happened to write in the exact same style by hand.
> Eventually, as models and their users both improve

I like how this is presented as a given thing that will happen, that models are going to just improve forever. That there isn’t some plateau on “user skill with LLMs” like it’s fucking calculus mixed with rocket science that only the elite users will ever attain full fluency in using.

This is starting to read like religious cult propaganda, which is probably scarier than whatever else ends up happing with this shit.

Are you implying that it's impossible to attain greater proficiency with LLM usage than one would have on their first day opening ChatGPT? The concepts of "Google-fu" and "tech literacy" must also be alien to you.
If you think I’m implying that, English must not be your first language or you’re being intentionally obtuse. Neither would surprise me.

Google fu, before Google fucked everything up, was not hard to learn, and then plateaued. It’s not like it was hard to do, and it’s not like once you figured it out there was this boundless growth potential to keep learning. You learned algebra one. Congrats, that’s all there was to it.

Tech literacy I’m not even going to address, because again either you don’t understand what that means or you’re being intentionally obtuse.

Feel free to state your point clearly without the juvenile insults, if you have one.
Prompt engineering isn’t hard to master, weeks to months tops. That is the point. LLMs are at the top of the s-curve.

Assuming prompt engineering is hard, assuming that LLMs are going to continue to make any kind of substantial leap without _any_ evidence other than blind faith, is as close to believing in a religion as it gets. Having blind “faith” in this house or cards, saying things like “when things continue to advance” without any evidence that there will be any advancement, is absolutely insane.

I’m having a hard time believing you needed me to spell that out.

Define "hard to master". I use LLMs all the time, sometimes as a writing assistant, and have never had the problem of trying to pass off a GPT response to a single one-line prompt as my own words. I haven't found LLMs hard to master.

An 80-year-old who barely uses computers and still types full sentences into Google (or who struggled for years to unlearn that habit) might find LLMs hard to master. Someone with poor written communication skills might find LLMs hard to master. Shockingly, it turns out that different people have different skills and life experiences.

I never used the word "faith". I'm not sure why you feel the need to make up a straw man to attack rather than respond to my comment as written, or why you feel the need to repeatedly insult me and accuse me of bad faith. It sounds like you're more interested in trying to win some perceived argument than engaging in constructive discussion.

Just use double single dashes--like this. Instead of a long—dash.