Hacker News new | ask | show | jobs
by mattnewton 18 days ago
I don’t disagree with your conclusion that this is likely ai rewritten, but I do find it strange that you say “normal people don’t write like this” when it is mimicking how people write, and using patterns I have seen people write. I think models are at the point where style is not really reliable as an indicator anymore.
6 comments

people sure do write like that, in novels. nobody writes scientific articles like novels, because scientific articles don't need to maximally capture audience attention. the purpose of a scientific article is to convey information - this pursuit is not assisted by punchy prose.
A lot of the common patterns people ping as AI (like "it's not X, it's Y"*) are marketing-speak, of which there's a lot of on the internet. It's applying existing patterns in unusual locations, ignoring the original context.

The one they're pointing out (the short punchy sentences) also apply to things like politicians and news articles. Blog posts are a bit odd.

* And here I mean those literal exact words. People are also extrapolating to similar patterns that use different or more words than "it's not" and "it's", but those flow better and aren't what I'm referring to here.

I'm sure there's plenty of writing in the above style to be found on the Internet, and hence having been trained on by the LLM. I'm also not a fan of this style, and in particular I'd say it's rarely or never found in scientific / technical writing meant to convey understanding rather than sell or hype. So here it's IMO more of a style mismatch.
LLMs average out all the writing they were trained on. Individuality and idiosyncrasy are flattened out or removed. That's why it all reads the same.
It’s not a model of an author, it’s a model of documents. That’s not the same thing.
No, but sufficiently-advanced overfitting would lead to to the model keeping track of an author stylistic profile, in the same way it keeps track of the plot of a story it's writing (i.e., badly, but well enough that you have to pay attention to notice that something is wrong).
It is trained on its own slop. They haven’t trained these models on books for three years at this point. Only on generated slop. (And RL slop upvotes/downvotes from users)