| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by capnrefsmmat 102 days ago

I work on research studying LLM writing styles, so I am going to have to steal this. I've seen plenty of lists of LLM style features, but this is the first one I noticed that mentions "tapestry", which we found is GPT-4o's second-most-overused word (after "camaraderie", for some reason).[1] We used a set of grammatical features in our initial style comparisons (like present participles, which GPT-4o loved so much that they were a pretty accurate classifier on their own), but it shouldn't be too hard to pattern-match some of these other features and quantify them.

If anyone who works on LLMs is reading, a question: When we've tried base models (no instruction tuning/RLHF, just text completion), they show far fewer stylistic anomalies like this. So it's not that the training data is weird. It's something in instruction-tuning that's doing it. Do you ask the human raters to evaluate style? Is there a rubric? Why is the instruction tuning pushing such a noticeable style shift?

[1] https://www.pnas.org/doi/10.1073/pnas.2422455122, preprint at https://arxiv.org/abs/2410.16107. Working on extending this to more recent models and other grammatical features now

7 comments

orbital-decay 102 days ago

I have nothing to contribute but speculation based on my intuition, but IMO RLHF (or rather human preference modeling in general, including the post-training dataset formatting) is a relatively small factor in this, RL-induced mode collapse is much bigger one. Take a look at the original DeepSeek R1 Zero, the point of which was to train a model with very little human preference, because they've been on a budget and human preference doesn't scale. It's pretty unhinged in its writing, like the base model, but unlike the base model it converges onto stable writing patterns, and the output diversity is as non-existent as in models with carefully engineered "personalities" like Claude. Ask it to name a random city and look at the logits, and you'll still see a pretty narrow distribution. At the same time some models with RLHF (e.g. the old RedPajama) have more diverse outputs.

Collapsed mode makes the models truncate entire token trajectories, repeat themselves, and indirectly it does something MUCH deeper, they converge on almost 1:1 input-to-output concept mapping (instead of one-to-many, like in base models). Same lack of variety can be seen in diffusion models, GANs, VAEs and any other model regardless of the type and receiving human preference.

Moreover, these patterns are generational. Old ones get replaced with new ones, and the list in the OP is going to be obsolete in a year. This is what already happened to previous models several times, from what I can tell. Supposedly this is because they scrape the web polluted by previous gen models.

lelanthran 102 days ago

Doesn't this apply to all output from a model, not just English?

IOW, won't code generated by the model have the same deficiencies with respect to lack of diversity?

orbital-decay 102 days ago

It doesn't depend on the language at all, it's a failure mode of the model itself. English, Chinese, Spanish, C++, COBOL, base64-encoded Klingon, SVGs of pelicans on bikes, emoji-ridden zoomer speak, everything is affected and has its own specific -isms and stereotypes. Besides, they're also skewed towards the pretraining set distribution, e.g. Russian generated by some models has unnatural sounding constructions learned from English which is prevailing in the dataset and where they are common, e.g. "(character) is/does X, their Y is/does Z". I don't see why it should be different for programming languages, e.g. JS idioms subtly leaking into Rust, although it's harder to detect I suppose.

djoldman 102 days ago

The RLHF is what creates these anomalies. See delve from kenya and nigeria.

Interestingly, because perplexity is the optimization objective, the pretrained models should reflect the least surprising outputs of all.

capnrefsmmat 102 days ago

I've heard the Kenya and Nigeria story, but has anyone backed it up with quantitative evidence that the vocabulary LLMs overuse coincides with the vocabulary that is more common in Kenyan and Nigerian English than in American English?

astrange 102 days ago

The newer Claude models constantly use the word "genuinely" because Anthropic seems to have forcibly trained them to claim to be "genuinely uncertain" about anything they don't want it being too certain about, like whether or not it's sentient.

andai 102 days ago

Interesting. Does this apply to all subjects? From what I understood, a major cause of hallucination was that models are inadvertently discouraged by the training from saying "I don't know." So it sounds like encouraging it to express uncertainty could improve that situation.

astrange 100 days ago

That's not a major issue. Any newer model with reasoning/web search has to be able to tell when it doesn't know something, otherwise it doesn't know when to search for it.

rafram 102 days ago

Not only is it genuinely uncertain about those topics, it’s also genuinely fascinated by them!

networked 102 days ago

You may be interested in my links on AI's writing style: https://dbohdan.com/ai-writing-style. I've just added your preprint and tropes.fyi. It has "hydrogen jukeboxes: on the crammed poetics of 'creative writing' LLMs" by nostalgebraist (https://www.tumblr.com/nostalgebraist/778041178124926976/hyd...), which features an example with "tapestry".

> Why is the instruction tuning pushing such a noticeable style shift?

Gwern Branwen has been covering this: https://gwern.net/doc/reinforcement-learning/preference-lear....

capnrefsmmat 102 days ago

Thanks for the links. You may be interested in the other LLM writing style studies I've been collecting: https://www.refsmmat.com/notebooks/llm-style.html

networked 102 days ago

You're welcome, and thanks. I've added a link to your notebook to my page.

grey-area 102 days ago

I wonder if th style shift has anything to do with training for conversation (ie. tuning models to respond well in a chat situation)?

capnrefsmmat 102 days ago

Probably. One common feature of LLM output is grammatical features that indicate information density, like nominalizations, longer words, participial clauses, and so on. Perhaps training tasks that involve asking the LLMs for concise explanations or summaries encourage the use of these features to give denser answers.

red_hare 102 days ago

I wonder if it has to do with how meaning is tied to the tokens. c+amara+derie (using the official gpt-5 tokenizer).

There's also just that weird thing where they're obsessed with emoji which I've always assumed is because they're the only logograms in english and therefore have a lot of weight per byte.

astrange 102 days ago

OAI puts instructions in the system prompt to use or not use emoji depending on your style settings.

kristianp 101 days ago

> It's something in instruction-tuning that's doing it.

Isn't the instruction tuning done with huge amounts of synthetic data? I wonder if the lack of diversity comes from llm generated data used for instruction tuning.

albert_e 102 days ago

There is an organization named Tapestry (parent of Coach Inc).

Wonder how they can avoid the trop while not censoring themselves out.