Hacker News new | ask | show | jobs
by successful23 440 days ago
I’ve noticed the same contrast - technical writing from LLMs often needs trimming for clarity, but creative writing can lean too far into either bland or overly flowery language.

Most LLM benchmarks lean heavily on fluency, but things like internal logic, tone consistency, and narrative pacing are harder to quantify. I think using a second model to extract logical or structural assertions could be a smart direction. It’s not perfect, but it shifts focus from just “how it sounds” to “does it actually make sense over time.” Creative writing benchmarks still feel very early-stage.

4 comments

I’ve also noticed that with longer-form text, the amount of meaningful information seems to plateau — it doesn’t scale proportionally with the character count.
It probably depends a lot on how the system is prompted. One of the interesting things about generative images is how easy it is to know what something looks like without being able to describe it.

Longform text is likely similar where there are a bunch of interactions and scenes that humans pick up on if they are there without being able to describe. The early Game of Thrones series was a fascinating example of good writing because most of the terrible things that happened to people were a neat result of their own choices (it had a consistent flawed character -> bad choice -> terrible consequence style that repeated over and over) - but I don't think most people would pick up on that without it being explicitly pointed out. And when that started to go away people could tell the writing was falling off but couldn't easily pick out why.

A hypothetical LLM could be prompted with something like that ("your writing is boring, please make consequences follow from choices") but it is less clear that the average prompter would be able to figure out that was what was missing. Like how image generators often needed to be prompted with "avoid making mistakes" to get a much higher quality of image; it took a bit to realise that was an option.

That's my experience as well. If you feed your summary + outline + guidance and prompt a one-shot output, it'll rush through it. If you prompt it for longer length, it'll extend it for little benefit. To get good output, you have to work in chunks, like a paragraph or a scene at a time, adjusting your prompt as you work through the outline.

That said, the resulting quality usually isn't so great that I want to put in the effort to do that, so I tend to interact with it in more of a choose-your-own-adventure way.

"don't use flowery or over exaggerated language" works well in my experience
> but creative writing can lean too far into either bland or overly flowery language

It's a style that can work. Patricia A. McKillip's fantasy novels are so flowery that I have difficulty telling what's going on.

I've never read one of her science fiction novels, but I find it hard to imagine they're written similarly.

It's nice that for coding and math problems, Claude's internal monologue involves constantly analyzing and critiquing its own implementations. It doesn't seem as keen on critical self-analysis of its creative outputs.