Hacker News new | ask | show | jobs
by Gormo 24 days ago
Almost everything on that list seems invalid, at least as a simple checklist criteria. All of the tropes involved are ubiquitous in ordinary semi-formal writing, which is why the LLMs are using them in the first place.

It's more reasonable to suggest that a mismatch in tone or register between the style of writing and the venue it's being published in could be an indicator of AI, and it's possible that people misidentifying these tropes as being AI indicators per se may themselves be suffering from a filter-bubble effect, e.g. someone who doesn't typically read long-form writing might only be encountering conventions of long-form writing in AI-generated content, and misattributing them to AI in itself.

That itself isn't such a great criteria on some sites where you have different userbases who interact with the site in different ways. For example, Reddit has a large "old guard" userbase that treats it like a traditional message board, with longer-form and more in-depth discussion, along with a lot of more recent users who treat it like Twitter, and expect everything to be short and informal. Users in the latter group misidentify posts by those in the former group as AI more and more frequently.

1 comments

The list isn't invalid. Those tropes existed before AI but they weren't used incessantly like AI does.
What mechanism do you suppose would cause LLMs to use writing tropes at a significantly greater rate than than is found in their training data?
I have no idea, but the fact is they do!

Probably something to do with the way they do RLHF?

> I have no idea, but the fact is they do!

Well, that's the assumption I'm challenging. That does not seem to be a substantiated fact at all.

Most of the arguments I've encountered online that attempt to make the case that LLMs do have a writing style that's readily distinguishable from human writing are deeply afflicted with confirmation bias, and are often based on circular reasoning.

My comment above was a rhetorical question intended to point out the fact that LLMs are specifically designed to mimic the patterns of writing found in their training data, so the idea that they'd all converge to some other set of pattern is not by itself plausible.

> Well, that's the assumption I'm challenging. That does not seem to be a substantiated fact at all.

It's not a assumption or confirmation bias. It's plain as day if you read any of these AI generated (or assisted) articles. You may as well ask me for proof that HN readers are pedantic.

It's not possible for any of this to be "plain as day" without being substantiated by data. There is no way to validate a statistical indicator without being able to independently measure the target variable.

Validating your criteria against the same subjective impressions that informed those criteria in the first place is an exercise in both circular reasoning and confirmation bias.