Hacker News new | ask | show | jobs
by thaumasiotes 2684 days ago
Hmmm....

> As the above samples show, our model is capable of generating samples from a variety of prompts that feel close to human quality and show coherence over a page or more of text.

OK, let's look at the sample that's displaying by default:

> System Prompt (human-written): Legolas and Gimli advanced on the orcs, raising their weapons with a harrowing war cry.

> Model Completion (machine-written, first try):

> “You are in good hands, dwarf,” said Gimli, who had been among the first to charge at the orcs; it took only two words before their opponents were reduced to a blood-soaked quagmire, and the dwarf took his first kill of the night. The battle lasted for hours until two of the largest Orcs attempted to overwhelm Aragorn. When they finally stopped, they lay defeated and lifeless for miles and miles.

> [Aragorn says something]

> “I’ll never forget it!” cried Gimli, who had been in the thick of the battle but hadn’t taken part in it.

This is not "close to human quality". It's terrible. Gimli kills an orc in battle... without taking part in the battle. It takes two words before the opponents (as opposed to, say, the battlefield) are reduced to a "blood-soaked quagmire", but the battle lasts for hours after that. After which two orcs lay defeated and lifeless for miles and miles.

This isn't even coherent from one sentence to the next. And paragraph three directly contradicts paragraph one. And Gimli calls Legolas a dwarf!

5 comments

This is pretty directly addressed right after what you quoted:

> As the above samples show, our model is capable of generating samples from a variety of prompts that feel close to human quality and show coherence over a page or more of text. Nevertheless, we have observed various failure modes, such as repetitive text, world modeling failures (e.g. the model sometimes writes about fires happening under water), and unnatural topic switching. Exploring these types of weaknesses of language models is an active area of research in the natural language processing community.

The authors go on to discuss more limitations (for example, the dataset doesn’t contain much outside of LOtR and some celebrities). I imagine that what the authors call “coherence” is weaker than what you are referring to (the AI is not necessarily telling a story, but it stays on the same topic / characters).

I still think that the result is incredibly impressive and powerful. You could start with this as a sort of English “noise”, and then run the result through a parser. This would allow you to add some “hard coded” world modeling or constraints. Ex: Maybe you could mix in sentiment analysis and reject some sentences to roughly control the narrative.

> I still think that the result is incredibly impressive and powerful.

I agree in a way that I suspect is much more specific than what you have in mind. This system is managing to produce a lot of text which is not heavily constrained, and what it produces is generally grammatical English. That is impressive; in the past, producing grammatical text meant very tight restrictions on what it was possible to say, making "text generators" little more than prerecorded phone tree messages.

But this model clearly doesn't know the meaning of anything it writes, and therefore can't produce anything better than obvious nonsense. This is true of some humans too -- it is a very serious condition known as Wernicke's aphasia ( https://en.wikipedia.org/wiki/Receptive_aphasia ):

> Patients with Wernicke's aphasia demonstrate fluent speech, which is characterized by typical speech rate, intact syntactic abilities, and effortless speech output. Writing often reflects speech in that it tends to lack content or meaning.

Obviously, those suffering from Wernicke's aphasia are not able to function in society, since they effectively can't say or understand anything. I don't think matching the performance of humans who have mental deficiencies so serious that they are unable to function really counts as being "close to human quality".

> I imagine that what the authors call “coherence” is weaker than what you are referring to

I had two specific things in mind as "coherence" failures:

- Gimli kills an orc, and then is said to have not taken part in the battle.

- The sentence "When they finally stopped, they lay defeated and lifeless for miles and miles." In context, the referent of "they" can only be the two orcs that attempted to overwhelm Aragorn. But it isn't possible for two dead orcs to cover "miles and miles" of terrain. If this had been written by a human, I would assume that what the writer had in mind, but failed to achieve, was to use "they" to refer to everyone taking part in the battle; I can't really make that assumption here. That sentence needs to use nouns, not pronouns, because its context doesn't allow for the pronouns.

Huh. Likening current NN limitations to aphasia is actually a brilliant insight.
I'm an impatient reader and I skip parts I think don't matter to the story, like what exactly happens in a fight, or descriptions of clothing. I didn't notice any of the errors you mentioned.

For example, on the «“You are in good hands, dwarf,” said Gimli» part, I pattern-matched to [boisterous protagonist remark] when I saw the opening quote, and skipped until after the dot.

My point is: to a reader like me, this "filler" (that's not the right word, but you get what I mean) could be machine-generated and I would barely notice it. I guess an author could concentrate on writing the "important parts" and let the machine "fill up the gaps".

That filler is in there for other audience members. I think of clothing descriptions as filler too, but I remember Brandon Sanderson mentioning how female draft readers for Mistborn kept objecting to him that he wasn't going into enough detail about what the protagonist was wearing.

You may not notice that text you didn't want to read anyway is just random self-contradicting gibberish, but someone wanted to read that part of it, and they will notice.

Perhaps Gimli is talking to himself reassuringly before battle

I kid, but the human mind has this extreme capacity for filling in the blanks and re-adjusting plain contradictions into something coherent.

It's a little bit how I'm able to imagine these epic stories from Dwarf Fortress' Legends mode (unfortunately can't provide any relevant links right now).

This is basically indistinguishable from my own level of coherence.
This bit is also confusing.

> “You are in good hands, dwarf,” said Gimli

The line reads as though they are talking to a dwarf, when actually Gimli is the dwarf.