Funny enough, I avoided the em dash, because everyone was using hyphens and I didn't want forensic linguistics bored. Now that AI got my FBI agents on welfare and em dashed the internet kaputt, now that I am liberated, I can't tell an em dash and hyphen apart, hand–written in my diary.
Genuine question, do you actually use the formal emdash in your writing? AIs are very consistent about using the proper emdash—a double long dash with no spaces around it, whereas humans almost always tend to use a slang version - a single dash with spaces around it. That's because most keyboards don't have an emdash key, and few people even know how to produce an actual emdash.
That's what makes it such a good giveaway. I'm happy to be told that I'm wrong, and that you do actually use the proper double long dash in your writing, but I'm guessing that you actually use the human slang for an emdash, which is visually different and easily sets your writing apart as not AI writing!
I have a Compose key binding in https://github.com/kragen/xcompose which maps Compose Space Minus to "—" with two thin spaces on each side of it, because I prefer the spaces. But HN rewrites the thin spaces to regular spaces, so on HN I just use "—" without the spaces, the way ChatGPT does, which is Compose Minus Minus Minus, and is in the standard Compose key bindings (if you map your keyboard to have a Compose key at all).
Same. I feel like I've been using "--" in my online writing for decades now. Take that, LLMs; I used it before it was cool... er... before it was a weak signal that a piece of text was written by an LLM.
In LaTeX (and probably smartypants which is another of those bare pre-unicode ASCII to fancy text converters that can get stacked into markdown--but I can't remember if dash handling specifically is in there), "--" is en-dash and "---" is em-dash. The single "-" gives a hypen which is handled differently than an en-dash in typesetting.
So... that's just to say that people who are exposed to the sorts of can't-unsee-it-now typesetting OCD that LaTeX and various popular extension packages within that ecosystem exposes can learn to write write "--" as en-dash.
It's sort of like being unable to return to the blissful state of not being hyperaware that Ariel and Helvetica are different.
Macs and iDevices have been auto-transforming -- into – for well over a decade now, and on the iOS standard keyboard both – and — are just a single long press of the dash key away.
Microsoft Word does this too. I've recently started manually uncorrecting these corrections in my writing because of this new implication that I used Chat-GPT.
Still less obvious than the emails I see sent out which contain emojis, so maybe I'm overthinking things...
This is a ridiculous maladaptive behavior. Word has been replacing dashes forever, consequently it has been unintentionally ubiquitous in business writing forever. That this character ever became a heuristic for AI is silly.
the dashes and the auto capitalization are awful for technical writing. The ctrl-z becomes painful and annoying very quickly. Would that Word supported markdown.
It’s a per-app setting that sometimes needs to be set in the text field’s context menu. There’s also a few apps that just don’t integrate with the macOS text system.
I would write it with Option-shift-hyphon when I used macOS.
On Linux, I use Compose-hyphen-hyphen-hyphen.
I don't use it as often as I used to; but when I was younger, I was enough of a nerd to use it in my writing all the time. And yes, always careful to use it correctly, and not confuse it with an en-dash. Also used to write out proper balanced curly quotes on macOS, before it was done automatically in many places.
I always used to google search "emdash unicode" and copy-paste the character, but I guess now I'll save several minutes from my essay-writing by switching to the lazy single-dash typology that I don't like the look of. Soon I'm going to have to start throwing in speling errors and other things too.
> That's because most keyboards don't have an emdash key, and few people even know how to produce an actual emdash.
There’s a subculture effect: this has been trivial on Apple devices for a long time—I’m pretty sure I learned the Shift-Option-hyphen shortcut in the 90s, long before iOS introduced the long-press shortcut—and that’s also been a world disproportionately popular with the kind of people who care about this kind of detail. If you spend time in communities with designers, writers, etc. your sense of what’s common is wildly off the average.
> Genuine question, do you actually use the formal emdash in your writing?
"the formal emdash"?
> AIs are very consistent about using the proper emdash—a double long dash with no spaces around it
Setting an em-dash closed is separate from whether you using an em-dash (and an em-dash is exactly what it says, a dash that is the width of the em-width of the font; "double long" is fine, I guess, if you consider the en-dash "single long", but not if, as you seem to be, you take the standard width as that of the ASCII hyphen-minus, which is usually considerably narrower than en width in a proportional font.)
But, yes, most people who intentionally use em-dashes are doing so because they care about detail enough that they are also going to set them closed, at least in the uses where that is standards. (There are uses where it is conventional to set them half-closed, but that's not important here.)
> whereas humans almost always tend to use a slang version - a single dash with spaces around it.
That's not an em-dash (and its not even an approximation of one, using a hyphen-minus set open—possibly doubled—is an approximation of the typographic convention of using an en-dash set open – different style guides prefer that for certain uses for which other guides prefer an em-dash set closed.) But I disagree with your claim that "most humans" who describe themselves as using em-dashes instead are actually just approximating the use of en-dashes set open with the easier-to-type hyphen-minus.
It was an abuse of “slang” to mean “typographic approximation”; now what, exactly, did you think “and its not even an approximation of one, using a hyphen-minus set open—possibly doubled—is an approximation of the typographic convention of using an en-dash set open” meant?
I do use the hyphen-minus set open sometimes - I'd prefer em-dash closed everywhere, but sometimes it's difficult to type an em-dash, and if I'm having to use hyphen, a closed hyphen looks very wrong. Similarly, "--" is shorthand for en-dash as you say, and "---" (even closed) looks too busy.
I’ve used “real” em-dashes and en-dashes in my writing generally since I switched to using Macs about 20 years ago. Before that I used them for e.g. academic writing, which I mainly did in LaTeX, but not so often elsewhere.
They’re simple enough key combinations (on a Mac) that I wouldn’t be surprised if I guessed them. I certainly find it confusing to imagine someone who has to write professionally or academically not working out how to type them for those purposes at least.
I will use a double hyphen: -- which Microsoft Word and I think most word processors I've used will auto-replace with an em dash. I will sometimes even type the double hyphen to represent an em dash in places where it doesn't get replaced, like internet comments. I'm kind of surprised more people don't use two hyphens as em dash shorthand, to be honest.
IIRC, -- for emdash used to be common on Usenet, which is where I picked it up and still do it. But there's a word for us with usenet experience -- old. (should have been a colon there, but...)
Most people probably don't. I'm an editor who's been working in print for years, so the keyboard shortcut for an em dash is muscle memory for me at this point. I have always been a Chicago Manual of Style person, so I don't place spaces around the em dash. AP style guide users do place a space around it.
I have --- set to autocorrect to —. I've been using it in formal writing for 30 years. When we were in high school, we had a "Dash Party" in English class, where we ate Twinkies and learned about the different dashes.
I would argue that LLMs overuse the emdash more because they overuse specific rhetorical devices, e.g. antithesis, than because they are being too correct about punctuation.
> Genuine question, do you actually use the formal emdash in your writing?
I’m not the person you asked, but I do.
> the proper emdash—a double long dash with no spaces around it
The spaces around it depend on style guide, it is not universal that they should not exist.
> That's because most keyboards don't have an emdash key
Nor do they have keys for proper quotes and apostrophes or interrobangs, yet it doesn’t stop people from using them. The keys don’t need to exist.
> That's what makes it such a good giveaway.
It’s not. It might be one signal but it is far from sufficient.
> I'm happy to be told that I'm wrong, and that you do actually use the proper double long dash in your writing
I do use the proper em-dash in my writing—and many other characters too—and my HN history is ample proof. I explained at length in another comment how I insert the characters, plus how simple it is if you use any Apple OS.
Nah, I've been using them correctly for years. My preference for them came by way of reading a lot in generally, but especially from Salinger.
I do them without surrounding spaces, because that's... how you're supposed to use them, and it's also less typing.
They also used to be a really good Shibboleth to tell if someone was using a Mac—the key combo on there is easy, and also easy to remember, so Mac users were far more likely than the median to employ em-dashes. It wasn't a sure tell, but it was pretty reliable.
I use en dash with two spaces and have done so before AI. But my comments here are from after GPT 4 released, so I guess I can't prove I didn't use AI to write them, although I don't think any AIs use that style. Here is one from February 2024: https://news.ycombinator.com/item?id=39386480. I don't like how "-" looks, it just looks like a minus sign and too short.
Been using shift+option+hyphen to make and use em-dashes (sans spaces) since at least 2005, when I got my first publishing job and also started blogging (so writing a ton more). I also use option+hyphen (en-dash) for date and number ranges. In my experience, ChatGPT consistently adds spaces around both.
I've had an Autohotkey replacement for the proper em dash character for over 10 years, using shorthand characters which triggers the replacement. Whether spaces are around the dash is a difference in style (see: various publications' style guides), though I use the no spaces style.
Being able to insert self-interjections and such with the correct character would undoubtedly be more widespread if it were more accessible to insert for most.
I for one, use an actual em dash in my writing—or at least I used to. Option + Shift + the hyphen key on Mac. I never knew if I was using it correctly, but I'd learn to copy how I'd seen it used in books and articles and things. Now, I have an incessant paranoia around using it.
The same way it learned to act like a personal assistant, even though very few humans are personal assistants.
The LLM is first trained as an extreneley large Markov model predicting text scraped from the entire Internet. Ideally, a well trained such Markov model would use em dashes approximately as frequently as they appear in real texts.
But that model is not the LLM you actually interact with. The LLM you interact with is trained by somethig called Reinforcement Learning from Human Feedback, which involves people reading, rating and editing its responses, biasing the outputs and giving the model a "persona".
That persona is the actual LLM you interact with. Since em dash usage was rated highly by the people providing the feedback, the persona learned to use it much more frequently.
Option + shift + hyphen or hold hyphen on any Mac or iPhone to get an em dash. I use them very frequently, because they're the correct character for the use case.
It depends on the keyboard layout. Some (US) do have it like you described, but others have the dashes reversed.
Both make sense, to a degree. On the one hand you can argue that the em-dash—being longer—should require and extra key, but on the other hand it has more uses so it should not have the extra key to be more accessible.
I've found that people who say this sort of thing rarely change their beliefs, even after being given evidence that they are wrong. The fact is, as numerous people have pointed out, Word and other editors/word processors change '--' to an em-dash. And the "slang version" of an em-dash is "I went to work--but forgot to put on pants", not "I went to work - but forgot to put on pants".
BTW, "humans almost always tend to use" is very poor writing--pick one or the other between "almost always" and "tend to". It wouldn't be a bad thing if LLMs helped increase human literacy, so I don't know why people are so gung ho on identifying AI output based on utterly non-substantive markers like em-dashes. Having an LLM do homework is a bad thing, but that's not what we're talking about. And someone foolishly using the presence of em-dashes to detect LLM output will utterly fail against someone using an editor macro to replace em-dashes with the gawdawful ' - '.
But do you call that latter thing you do “an em-dash”? Do you tell a peer “You should put an em-dash here” when what you mean is a “space en-dash space”?
P.S. Why would someone be "suspicious" of people doing their writing in Word and copying it into a comment field? Suspicious of what, exactly? What crime is being committed? The issue here is AI, not people's workflow methods. I have sometimes written lengthy comments in my editor (emacs) which gives me many more editing features, doesn't throw away my work with the wrong keystroke (not a problem at HN but it is at sites where the comment field is a pop-up), and doesn't randomly freeze or slow down radically (this seems to be a problem with my browser).
I reject everything else about that poorly reasoned "suspicious" response as well.
I type em dashes as double hyphens. Sometimes the software resolves them to a true em dash, but sometimes not.
I never use hyphens where em dashes would be correct.
I do have issues determining when a two-word phrase should or shouldn't be hyphenated. It surely doesn't help that I grew up in a bilingual English/German household, so that my first instinct is often to reject either option, and fully concatenate the two words instead.
(Whether that last comma is appropriate opens a whole other set of punctuation issues ... and yes, I do tend to deliberately misuse ellipses for effect.)
Reddit and HN are among the highest quality sources of training text and are probably weighted very heavily as "probably human" in the mainstream models.
Any source of text with huge amounts of automated and community moderation will be better quality than, say, Twitter.
That depends heavily on the subreddits you browse. There absolutely are places with high quality content, though it feels like they are getting sparser and sparser.
Not in that sense; high quality in the sense that there are a lot of actual, real people posting there, and those people tend to come from a pretty diverse set of backgrounds.
Perhaps on the smaller subreddits, but have a look at /r/all on any given day and it's obvious that real people, and diverse backgrounds, it is not. Every single subreddit that goes above a certain activity threshold collapses into the exact same state of astroturfed, mass-produced political slop targeted towards low IQ people.
Although I'm sure @stinkbeatle was joking, I should clarify that most LLMs are trained on books and online articles written by professional writers. That's why they tend to have a rich vocabulary and use things like hyphens.
I agree, HN is an amazing community with brilliant people and top quality content, but it's not enough to train an LLM.
Last thing. An LLM is just a tool, it can clean up your writing the same way a photo app can enhance your pictures. It took a while for people to accept that grandma's photos looked professional because they had filters. Same will happen with text. With ChatGPT, anyone can write like a journalist. We're just not used to grandma texting like one, yet :)
I often edit things in Word — I have a document that I can alt-tab to and type things. It has spellcheck, etc. that my browser window does not, and I’m not at risk of losing if I refresh or something. Then copy-paste back.
Word converts any - into an em dash based on context. Guess who’s always accused of being a bot?
The thing is, AI learned to use these things because it is good typographical style represented in its training set.
Dammit -- I use my dashes all the time (though always double them like here). I hope AI didn't ruin this for me.
(I learned to use dashes like this from Philip Dick's writings, of all places, and it stuck. Bet nobody ever thought of looking for writing style in PKD!).
I encountered the TeXbook at a young and impressionable age, and ever since I've used em- and en-dashes a bit more often than a style guide would suggest. Not to mention diareses, though those haven't been flagged as LLM stigmata yet.
My workaround (well, to be honest, I've always done this: I love a good em dash, they're terrifically satisfying to use, but I'm too lazy to type them), is to use two single dashes--like so.
Good. It's a crutch for poorly composed sentences or for prose intending to imitate the affect of poorly composed sentences. There's not a single sentence under the sun that needs an emdash. Commas and parentheses can do it all, and an excess of either is a sign of poorly edited prose.
I don't buy the pro-clanker pro-em dash movement that has come out of nowhere in the past several years.
> There's not a single sentence under the sun that needs an emdash
Sentences "need" very little, but without style and personality, writing becomes very boring. I suppose simplicity without any affectation works for raw communication of plain technical facts, but there's more to writing than that.
What's the error? I'd hyphenate "poorly-composed" (most wouldn't these days, but they can go to hell) and I think it's a bit too wordy for what it's communicating, but I don't see what I'd call an actual error.
I would personally avoid writing that "poorly composed sentences" have an "affect"—rather than the writer having or presenting an affect, or the sentences' tone being affected—as I find an implied anthropomorphizing of "sentences" in that usage, which anthropomorphizing isn't serving enough useful purpose, to my eye, that I'd want it in my writing, but I'm not sure I'd call that an error either.
What did you mean?
> Commas and parentheses can do it all, and an excess of either is a sign of poorly edited prose.
This attitude, however, is a disease of modern English literacy.
I meant "affect" and not "effect." You need to learn what affect means. I'm not asking you to learn about affect theory, but ffs no part of my sentence implied it meant "effect" and not "affect." Ugh. It doesn't even make sense. What would the "effect" of "poorly composed sentences" be? Only affect makes sense there.
For both of these examples who the fuck cares. I just evaluate AI writing people send me the same as any writing.
If they’re using AI to speed things up and deliver really clear and on point documents faster then great. If they can’t stand behind what they’re saying I will call them out.
I get AI written stuff from team members all the time. When it’s bad and is a waste of my time I just hit reply and say don’t do this.
But I’ve trained many people to use AI effectively and often with some help they can produce way better SOPs or client memos or whatever else.
It’s just a tool. It’s like getting mad someone used spell check. Which by the way, people used to actually argue back in the 80’s. Oh no we killed spelling bees what a lost tradition.
This conversation has been going on as long as I’ve been using tech which is about 4 decades.
Use of spell check is a net positive but it has led to some widespread errors, like people (and widely read publications) misspelling "led" as "lead" (pet peeve).
But yes, it's absurd to complain about LLMs resulting in increased literacy.
I think it's easier to just stop using em dashes, as much as I like them. People have latched on to this because it works a good amount of the time, so I don't think they will stop. I don't even think they should stop, because, well, it works a good amount of the time.
You're making the point that OP never actually uses the em dash, by surveying their HN comments, in order to defend the notion that no one actually used em dashes prior to their proliferation by LLMs? Or do you mean something else?
You can find an em dash in my comment history if you're curious. Despite what could be said about poor sample selection, consider the imbalance of the argument being made: the frequency of em dash use is disproportionate to the suspicion thrust upon a sample of writing. I.e., a single em dash is suspicious, regardless of how many times it might show up. Therefore, it's more likely that someone who uses em dashes—even if only rarely—will self-select to respond to a thread like this and feel compelled to defend themselves.
Haha yep. I never saw a single person use these in internet comments pre-2023. Plenty of hyphens to simulate it - like this - but not actual em dashes. No matter how many people swear up and down that they're so important.