Sarcasm aside. Even the output from GPT-4 is quite bland and generic. Very good for performing tasks (I.e convert a blob of text into a knowledge graph or generate X code), but quite awful at the elegant prose we see at play
Wonder how long before that’s solved. Hard to believe training on such large swathes of the web doesn’t result a compressed generic representation of language within its weights.
It seems like there’s a tension. On the one hand generating the most likely sequence of tokens maximises the chance the response will make sense and be relevant. On the other hand it also guarantees you will get the most bland and unimaginative response.
I've only explored ChatGPT in the most cursory way, but wouldn't this be a function of the prompt it's given?
E.g. "Summarize this idea in a way that uses novel comparisons to everyday phenomena. The target reader is someone who doesn't have domain knowledge of the field...etc."
Yes, you're correct. People who complain that chatGPT is bland don't realize that they have to specify a style to not get the average of all content.
It's just like if you ask an image AI for a "woman" you will get the average of all artists women and it will look very generic and bland. But with the right stylistic qualifiers in your prompt you will get something so captivating that it wins awards for its creativity.
Prompting works very well, but I have still not seen any output from ChatGPT that captivated me in a way my favorite writing has.
I used to work in the field of creative text generation for fiction. I’m genuinely very curious and I go out of my way to find compelling examples. There is definitely “good” output that’s on the right track. GPT-4 also does way better, but it still falls short.
This is also difficult to just evaluate objectively. If you find that GPT models have generated the best prose you’ve seen. That’s wonderful! I understand that my standards are quite high (high does not equate to “better” either)
In 6 different tabs, ask chatGPT GPT-4 version "Write a 200 word prose about X in the style of Y which perfectly mimics their style and perspective and label it Prose A,B,C,D,E,F"
Then open a new chatGPT GPT-4 chat and ask "Rank the pieces of prose below on how much they sound like something X would write, then detail your reasoning."
Then read the winning prose and be surprised at how much better "best of N" is.
....
And if that isn't enough, open two chatGPT GPT-4 windows side-by-side and prompt each with "You are Editor A/B. You will work with Editor B/A to make a piece of prose more accurately resemble authentic prose written by X"
Then give A the prose and copy its response to B. Copy their responses back and forth as they edit the prose.
After 5 or so back/forth it will be even more indistinguishable from the a genuine article.
....
And if that's not enough, you can have all of this done automatically programmatically with the API so you can just sit back and get the final result with no more work than putting in the topic and author's name.
that's exactly why it doesnt generate the most likely sequence of tokens! They are chosen at random based on the probabilities assigned by the model, so there is a chance of unusual output. In the API you can tweak the "temprature" which weights this towards more novel output
I’m very familiar with temperature and other parameters you can use to tweak output. They can take you decently far! GPT-2 can produce very coherent convincing output even today if you know what to tweak
Decoding methods also matter, and it’s a shame we aren’t given token probabilities (or any insight into model output) so we have more creative control over how to decode the output. Some of the better literature I’ve seen involving creative writing did have novel decoding methods
I replied to another comment with my thoughts on prompting. I will add that I don’t consider mimicking another writers style to matter much. Seems like an easy cop out (just my opinion though).
That’s not to say it isn’t impressive. It is and it accomplishes the job very well. We’re looking for progress not perfection, but I personally have very high standards from creative writing and GPT doesn’t meet my personal bar. However not everyone shares that bar and personal evals of GPT’s output are equally valid. Plenty of people find it to be great and at the end of the day that’s all that matters
Regarding Bing Creative. It is delightful! I do like it a bit and whatever they’ve done to the system does make for some of the better output I’ve seen from LLMs
Wonder how long before that’s solved. Hard to believe training on such large swathes of the web doesn’t result a compressed generic representation of language within its weights.