Hacker News new | ask | show | jobs
by pfannkuchen 109 days ago
Meta question: Can anyone comment on why ChatGPT produces such patterned writing? There are structures that it uses in nearly every response, and it’s obvious that much of this article was copy pasted from its output. But the corpus LLMs are trained on don’t have these patterns, at least not nearly at the frequency that I think would be required to produced them so consistently in the output. Does anyone know why this happens?
2 comments

The average of 2+3 is neither 2 nor 3, but somewhere in the middle?
I don’t really see how a near infinite corpus which occasionally contains that pattern could end up with it so highly represented in the output.

I could see this being an argument for why it ends up being bland or having an inconsistent style (for example).

So I suppose that could be a theory, that what happens naturally has a structure that needs to be corrected for, and right now the mechanism that produces the structural correction is overly simplistic and heavily overweights examples of that structure, but that this structure does not actually occur naturally from the model.

I got the impression it's due to the reinforcement learning step, where it's taught to follow instructions, which rewards the model for such writing.