Hacker News new | ask | show | jobs
by substation13 1129 days ago
It's really easy to get an LLM to hallucinate by asking an open ended question - the type typically answered by a Google search or checking Wikiedpia. However, this is not the best application of LLMs. This criticism is getting old.

LLMs are great at:

- Text synthesis given all of the facts in a prompt (expand these bullet points)

- Summarization (condense this text)

- Data extraction (fit this data into this schema)

- Fiction (virtual characters, scripts, etc.)

They will dramatically change these industries.

4 comments

> - Fiction (virtual characters, scripts, etc.)

I've found it bad for this, does not generate something I'd actually read.

It seems to do ok at coming up with starting points or give you options if you're stuck. But the quality of the prose it comes up with is indeed awful. It gets a bit better if you ask it to write in the style of a specific author, but marginally so.

I guess maybe it gets to mediocre fan fiction level.

That's still pretty impressive, but not very usable for creative writing yet.

Don't most create writers have a process?

I wonder if there's a series of steps(prompts) that could be used to get it to out put something much better. I know I've used it to write something I would have never even attempted let alone tried to write on my own. It came out ok, but better than I could have done on my own.

Some do. Others will just sit down and write. But the problem isn't so much that it can't handle plot. That is amenable to process - there are huge numbers of different processes, and one that might work well for GPT is something called the Snowflake method, which is basically iterative refinement. E.g. start with a one line description, expand it to a paragraph, expand each paragraph to a paragraph, then to a page or a few, and eventually to a list of scenes, and write out the scenes. Oversimplified (there's some steps with character sheets etc. too).

For that it might well be useful, because you could do one iteration at a time, edit the output to keep/reject ideas and do the next step.

But the challenge is that while it might not be "easy", it's the less time consuming part of a novel (certainly has been for me). The time consuming part is writing out the scenes, and the part GPT so far is awful at is the prose. So even if you manage to get it to produce a coherent script setting out what should happen, you still (so far) will have to expect to rewrite the entire thing anyway. That may or may not be useful to you. For my part I suspect I'd write faster from scratch than trying to edit and keep it consistent.

That said, given how far it's gotten I wouldn't at all be surprised if it can get to reasonable prose in another couple of versions.

It is good at inferring the correct people into a story. But the story many times leaves something to be desired.

Other times though I did have a lot of fun having it spit out SCP stories. As those can many times have a ton of template like logic to them. Due to the nature of SCP being written in a tone of a formal report. Plus well over 2000 different examples.

Also some of that could be due to lack of training data. Like a TV show might be 4 seasons long and a particular character may have had 3 or 4 lines total. It would be like asking it to write a story about Boba Fett given the original 2 movies where he showed up and had maybe 1 or two lines. There just is not enough to extrapolate anything. But you ask it to write something about Harry Potter and it probably could get the style close enough as there is more training data.

My biggest grip is sometimes it just gets stuck in a loop. Once you are in one, the thing just will dump out the same hallucinations over and over.

Try Anthropic's Claude[1]. I've found it to be better at creative writing than GPT4 or even Claude+.

That said, it's still not great, though sometimes you can luck on to finding a gem in what it writes.

I've also had luck in giving it examples of the sort of thing I wanted it to write and asking it to write something similar, but with certain modifications that I wanted it to make.

Giving two or more examples and asking it to combine them is also fun.

[1] - https://poe.com/Claude-instant

I wonder if an LLM trained on your favorite author how many words/sentences paragraphs it could generate in the middle of a book that would be basically undetectable.
You don't even necessarily need to train it specifically on their writing. Just giving an LLM an example of the sort of writing you want and asking it to write something similar is sometimes enough.

But, yeah, training it specifically on a corpus of work would probably be even more effective.

I'd love to be able to do that and get output that's at least on the GPT4 level. I think we'd probably have to have a breakthrough in LLM architecture and/or some amazing advancements in hardware before it becomes practical and cost effective for individuals to train their own GPT4-level LLMs, though.

LLMs also hallucinate during summarization tasks, adding topics that were not in the original
I've built internal systems that do summarization based on knowledge retrieval systems for specific nonpublic corporate information.

With GPT-4, I find very little hallucinating. It very rarely deviates from the source material. Every time I've found something unexpected, there was a problem in the source material provided to the model.

"very little" is still an unacceptable amount for most fields.

Quantify "very little" over what time period, variations of use, fail states, sample size.

> However, this is not the best application of LLMs

It is, however, the commercial application everyone - including search engines! - is implementing.

To be fair, the ones I've seen use a form of point 1 (giving all facts in the prompt) by allowing for searching the web, which becomes a version of point 2 (summarization).
I can see LLMs as a novel front-end for a traditional search engine.
However, this is not the best application of LLMs.
Regarding the last point: What I still find the most entertaining is how easily you can change its personality, especially via the system prompt. You can get it to be rather snarky, even sometimes insulting, which makes for hilarious IRC bots.
In less then 20 tokens, you can get ChatGPT simply via the web interface to become snarky and swearing like a drunken sailor.

And it is indeed hilarious at times.