| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by postalrat 19 days ago
	Where do you think llms learned to write that way?

6 comments

jlund-molfese 19 days ago

You can also look at past posts by the same author (before LLM usage proliferated) if you’re curious.

The project is still very cool, but it’s a little less enjoyable to read when everything sounds the same. It would be just as annoying for people to manually write in a corporate/marketing style, because humanity is what makes the small web interesting.

https://blog.tymscar.com/posts/privategithubcicd/

tymscar 19 days ago

I’m glad I’ve started this blog before the AI wave so I can prove people I’m just weird at writing.

It grinds my gears how so many people just talk about my writing style instead of the content.

fouc 18 days ago

What's interesting about the older post is that all the sentences are long, compared to the current datacenter GPU post which contains lots of short sentences.

But yeah, probably feels sucky to have your style analyzed for AI writing. FWIW, the datacenter GPU post was great! I went to look at the ebay postings.

tymscar 18 days ago

Thanks!

I did get feedback on all sorts of things over the years.

One of them was to do with sentence lengths.

LearnYouALisp 11 days ago

I mean, if they're serious, put it through 10-20 "AI Detectors" that cover different models. (I've had to do that with samples of new 'textbooks' showing up on Amazon under various false names.)

lelanthran 19 days ago

> I’m glad I’ve started this blog before the AI wave so I can prove people I’m just weird at writing.

Your previous blog posts didn't trigger any LLM detector (go on - check for yourself).

tymscar 19 days ago

Neither does this one. I replied in another thread. It comes out as 0% and the one from 2021 comes out as 8%. LLM detectors are all BS

lelanthran 19 days ago

GPTzero says 100% AI generated for specific paragraphs that I chose (such as `Multi-token prediction`). If you remove all the code listings, tables, etc and just paste the prose into these tools, it drops to 87% AI generated.

None of the 3x older blogs of yours that I tried went above 5% AI generated.

Maybe you're spending so much of time with the LLM that you are talking like it; in which case, take an old blog and a recent blog, give the prose from them both to you favourite LLM and ask them if the same author wrote both. I just did that on ChatGPT and on Gemini, and both found that it is extremely unlikely that the same author wrote both.

Look, if all the SOTA LLMs agree that your recent blogs sounds generated, you can't blame the reader, can you?

tymscar 19 days ago

GPTzero is a joke.

It thinks this is AI: “I bought a datacenter GPU that doesn’t even have a normal PCIe connector, stuck it in my gaming PC with an adapter, and now I have 32GB of VRAM across two GPUs running a 27 billion parameter model at 32 tokens per second.”

There’s nothing AI about that. Not all SOTA LLMs agree, hell, none of them do. The same exact example I sent here gives me 0% in some, 10% in others, 100% in GPTzero.

iugtmkbdfil834 19 days ago

This, setting aside the llm issue, it is dealing with hardware in ways that -- one would think - would be celebrated on HN of all places. But we focus on presentation.

tgv 19 days ago

Because their custom training data contains an emphasis on such verbiage. It doesn't come from the God-knows-how-many TB of web content the model is pre-trained on. There, such phrasing is only a drop in the sea. But the "yes, you're right" phrases, the em dash, etc., come from the later stage, for which content is created according to some (probably overprecise) guidelines.

rafram 19 days ago

Right. The overuse of "genuinely" most of all. Seems like they put Claude through a few good rounds of training to always answer questions about its consciousness, thoughts, etc., with something about how it's "genuinely unsure," and as a result, the model learned to use "genuinely" as an intensifier in all sorts of inappropriate contexts.

iugtmkbdfil834 19 days ago

Oi, I personally use adverbs everywhere. Genuinely, kids these days.

anigbrowl 18 days ago

It's a very specific style of condescending journalism that US media has been nurturing and recycling for decades now. I was going to write this this whole comment as a parody of it, starting with some literary hook like 'Call it Ouroboros syndrome:' but I can't bring myself to add to the pile.

I have not done the textual and statistical analysis to verify this, but I feel like it's something you could trace back to east coast journalism schools and publishers mediated via television, which long predates mass adoption of AI. Think how many news articles you've read with titles like 'Anatomy of a murder' os 'Inside the meeting that changed everything.' The hooky, slightly pompous tone is something you can find back as far as the 1960s or 1970s; browsing through old issues of Readers Digest and you'll find tons of it. When I say it's mediated through television, I'm talking about both the dramatic and heavily conclusory style of fictional prosecutors and narrators, and the extremely shallow style of TV news reports (often transcribed to the web) which are only one or two sentences per paragraph. And this is before we consider the stylistic impact of ad copywriting on communication in general.

And there's something else.

The one sentence paragraph interjection, designed to refocus your attention in a surprising new direction after two paragraphs of stuff you already know. 'I never thought I'd end upere,' said Sally Nocontext, hooking you in for another paragraph or two where you try to figure out who this woman is, where she ended up, and what it has to do with the article you are already halfway through reading. After all, I've come this far, the reader through. I might as well see it through to the end.

And that's just what publishers wanted.

One sentence can also validate a truism that the reader already suspects, flattering their beliefs in their own analytical powers....

...well you get the idea. When I'm using LLMs for any sort of extended session, I find myself reaching for the same few prompts to break it of such clicheed expression; I'm especially averse to the habit of adding zippy-sounding nicknames to complex or potentially dull concepts. I don't have a favorite starting prompt, but I generally find that asking for 'a concise, academic tone' does wonders to de-fluff its output. Remember, it defaults toward being as widely accessible as possible, and much journalism is aimed at consumers with only a high school education and maybe middle-school reading comprehension, math ability, and appetite for depth over sensation.

LearnYouALisp 11 days ago

How do I save a comment; this very context has been under consideration for a while.

krackers 18 days ago

This particular case seems to be an LLM trying to blindly apply the saccharine therapeutic pattern "your frustration is real" to a context where it doesn't make sense. No one is debating or questioning the fact that the card has 16 gigs of vram.

The point of "X is real" in a therapeutic context is to make the person feel seen and acknowledged, that his struggles are real to him and really do weigh on his mind, even if it is technically "all in his head".

LearnYouALisp 11 days ago

It goes both ways, too -- people see all this on Medium, elsewhere, even in the comments on threads in YC

lelanthran 19 days ago

> Where do you think llms learned to write that way?

Not from individual human content, that's for sure - maybe MLM marketing copy? Sleazy 4AM ads?

I mean, every time this response comes up, I keep asking the person to point at something written prior to 2022 that gets 80%+ on the LLM detectors, and yet no one can find anything.

Maybe you, postalrat, can find something written in this style that was published prior to 2022.

tymscar 19 days ago

I have written the blog post. I know empirically that I have used 0% AI while writing it. I also know LLM detectors are total BS and they don't really work. I have tried a couple on this exact blog post, and QuillBot, for example, gave me 0% AI detected on it.

I have then used a blog post of mine from 2021. QuillBot gave me 8%...

The King James version of the Bible came out at almost 100% AI generated a while ago. It was the HN front page.

Stop thinking that if someone writes in a way that is fun or looks like what you would think an AI writes, then it is AI generated. Loads of the time it is, but sometimes it's not, and it really hurts those like me.

lelanthran 19 days ago

> I have tried a couple on this exact blog post, and QuillBot, for example, gave me 0% AI detected on it.

Don't use Quillbot; not sure why, but their model is reluctant to classify anything as AI generated. I ran into this when proof-reading a students Phd - ChatGPT, Gemini, CLaude (and others) all agreed it was AI generated, but Quillbot said it wasn't.

tapete1 18 days ago

So you are the AI?

I mean, seriously, which human says "the compute"?

hattmall 19 days ago

It's a function of the LLM "thought process"! It's not really modeled after human speech. It is in short segments but not long form, same reason you see the same rather odd nuances in LLM generated code.

If they way you thought was to run a bunch of if statements, generate content, then feed that content back to get a "score" of what seems the most plausible, run the if statements again, and adjust / merge responses, then you would write similarly. The recognizable cadence of LLM generated content is pretty clearly the result of a lot of if statements being fused together.

alehlopeh 19 days ago

Marketing content.