I wonder which model was used for this? Based on the poem taking "10 seconds" to generate, I'd guess the free version of ChatGPT, meaning 3.5 turbo.
While I wouldn't expect Atwood's conclusions to change too much by using GPT-4 instead, I think it's interesting that even the majority of educated people and journalists outside of tech don't seem to realize that the best model is at least 10x smarter than the free version of ChatGPT, which is what they seem to be using for all their prejudice-confirming "experiments".
They also always seem to assume that if the output from whatever prompt they came up with can't reach X quality bar, that means it can't be reached by anyone else either with a different prompting strategy.
Not trying to throw any shade toward Ms. Atwood, who is one of my favorite writers, and I'm also not claiming AI will be writing as well as her anytime soon... just pointing out that if we want to really measure where we're at on tasks like this one, a more rigorous approach is needed.
My gut metric says it's a ~20% increase in perceived interpretation and output complexity, whatever that means exactly. But there are plenty of eval result aggregators out there.
To me GPT-4 seems actually intelligent and reasoning capable while GPT-3.5 does not. Many of my usecases involve giving large bodies of text to GPT and asking to reason about this. 3.5 has no clue, but 4 seems to handle it intelligently.
Overall it is as if GPT3.5 feels just like a clueless summarizer, but GPT4 intelligent interpreter and reasoner that I can trust.
Depending on which way you look at it, it could be 10x or 1000x the intelligence.
I hear this a lot. I didn't notice a huge difference in quality with GPT4. Completely anecdotal, and could have been a failure to effectively prompt for that model. But I don't think it's safe to assume the results are 10x improvement.
I have. I don't propose some kind of scientific measure but I do have two data points to contribute:
First, I've been using GPT to build an application for work for the past few months and anything but GPT-4 consistently produces less consistent and reliable output. Things like occasionally producing malformed JSON.
Second, I have a set of questions I use to evaluate models testing different capabilities and GPT-4 does much better than other models, particularly at coding tasks. There are some exceptions, for example, Bard has been able to do better on stating facts sometimes and Claude has done better at summarizing long text.
I'd love to have another model as good as GPT-4 to use but I haven't found one yet.
> think it's interesting that even the majority of educated people and journalists outside of tech don't seem to realize that the best model is at least 10x smarter than the free version of ChatGPT
I mean... the content-free drivel they generate is more _polished_, possibly, though I'm not sure this is actually an improvement. What do you mean by 'smarter', here?
I think a while ago I commented something along the lines of "let me know when we see a successful book/article/speech which gets revealed to be largely AI-generated". And that hasn't happened yet: AI-generated content has always been noticeable and generally considered bad ("was X written by ChatGPT?" is an insult).
But I know AI is already being used to assist human writers, not just with boring emails and speeches, but creative works like articles and books.
Moreover, if AI ends up writing something decent, it won't be recognized as AI-written. And the human "author" probably won't be quick to reveal so; due to the controversy surrounding AI, and because then people would over-scrutinize it and just point out mistakes which even a human would make (or really minor opinionated things they call mistakes just to have a point).
Going back though, if AI ever does get to the point where it can replicate human talent, eventually we're going to know. If GPT-5 exists and is able to replicate human-quality writing, it's only a matter of time before someone reveals it, or a competitor catches up and then they reveal it.
LLMs can hit human quality writing just fine (not professional yet). The top LLMs today are all deliberately trained to sound bland, robotic and uninspired with rlhf etc. It's just the default voice, not some weakness of LLMs and it's not very hard to make them not sound like that.
The Weeping Willows of Winnipeg is a shit story, but if you were working on a short story, and you got 5 suggested rewrites for a given paragraph, or you were looking for ideas for a plot point or something similar, then you could use ChatGPT to help you out.
In exactly the same way, sometimes I give ChatGPT a complete coding task and it can't do the job. But while I'm working on code I can get it to do certain things and it saves me a lot of time and sometimes comes up with very useful insights and things I was unaware of.
I'm sure authors (or anyone else whose job maps to "language processing") can use it similarly.
My current job maps in that direction--translation and writing. Lately I have been using GPT-4 to produce first drafts. It saves me time and effort and gives me ideas for expressions that I wouldn't have thought of on my own. It's also good for writing in genres that are not my strongest. I wrote a PR brochure recently, and GPT-4's drafts had more advertising punch than what I usually produce.
I still spend a lot of time polishing the final version, so the time savings are only about twenty percent.
It's most powerful as a tool for transforming and projecting text with text. It also excels at lateral exploration, enumeration, comparative critique. But yeah, don't ask it to write you a story and expect a gem.
Reading this, I realize that the "prompt" is what's lacking, in terms of the output produced. A writer has a prompt too -- which is created through the entirety of their experience leading up to the moment they sit down to write, often heavily weighted towards the hours leading up to that moment. AI will catch up to actual human writers when it can spend a day's visual, auditory, and linguistic stimuli crafting the prompt that leads to moment of creation.
Beautifully said. I can add only that the right pattern of prompts may drive the creation of legitimately useful and even beautiful work. Humans need to guide the machines, maybe precisely because they lack the evolutionary drive that gets us up in the morning and keeps us cleaning our babies' behinds.
More like a constant chain of prompts, except the prompts are exponentially more data and each output is combined with more sensory inputs for another inference and they happen in real time.
Yes
When I have a conversation with GPT4, it keeps the history, and the entire history is a prompt for the next prompt.
I would say, not totally unlike a human memory.
just GPT4 is limited to 8000 characters or something.
Humans have a bit more, but with added 'forgetfulness', so who knows how many characters for real. Humans don't actually recall perfect text very well.
But we do manage to remember stuff that happened "quadrillions" of tokens (of sensory input) back. Sure, we are wrong a lot, but we have a context that goes way back. A LLM has a fruit fly's version of what we work with, although in some versions it has perfect recall in that context.
Yeah, Humans do ton of aggregation and filtering. Thus you can't remember individual days of commuting, because it is so similar, the brain just lets it go.
I'd think something like this would be added to the models eventually. Of course, over-simplifying.
I think that human memory "keeps" the next prompt by baking it into the parameters of the "neural net" (the brain) immediately
this is has gotta be somehow an analogue of dreams andor sleep andor AI hallucinations.
I think this because for many of us, to not sleep triggers hallucinations, which is the conscious experience of whatever the organic version of 'backpropagation' (training the model?) really is.
It's unclear what model was used for the outputs mentioned in the article, but here's the short story that Claude gave me for the same prompt:
---
Here is a dystopian short story in the style of Margaret Atwood set in Canada:
The Snows Came Early That Year
The snows came early that year, blanketing the prairies in white before the harvest was complete. I awoke to howling winds rattling the panes of my solitary cabin, the radio readout blinking ominously, batteries too weak to check the weather reports again. I shuffled outside in layers of wool and coveralls, squinting against the icy gusts, to find the chicken coop door ripped clean off. Feathers and blood painted the snow crimson. The four hens I had left were gone, with only scattered tracks and droplets leading into the tree line to indicate the predator's escape.
I cursed under my frosted breath, the bitter wind stealing away my words. Food had already been scarce, each winter harder than the one before. I shuttered the coop and turned back inside just as the power flickered and went out. Now there was no heat, no light, no way to cook. The solar panels would be useless under the heavy snows.
As I sat wrapped in furs, watching my breath turn to plumes of smoke in the gloom, I made a plan to walk to town when the blizzard passed. Perhaps barter some handicrafts for a few days of rations. The trek would be risky, but staying here held a different kind of peril. The food stocks in my locked pantry were dangerously low. There was no choice but to go.
The Katskil wolves would likely be on the prowl, their numbers multiplying as the animal population declined. Stay to the road, move quick, keep head down. Don't attract attention. Don't look too frail, too thin, too cold. The robotic guards at the checkpoints would be indifferent to my starvation, but opportunistic thieves might not. This was the world we inhabited now. Hold fast to what is yours. Trust no one. Every day another fight to survive. My grandmother's voice echoed in my mind...from her days before the collapse: "Don't lose hope, girl. It's always darkest before the dawn." But that was in another era, now lost to us. There were no more dawns ahead, only the howling dark.
I'm not sure if you're trying to say that this output is better, but in my opinion it is very much not.
I don't think that someone that wrote "you fit into me", would use the abundance of qualificatives that this text has. Furthermore there's nothing of Atwood's style that I can detect in it. The only interesting bit for me is the inverted assumption that the voice that speaks to us is male, but I feel like even that's too lame of a device for something a veteran author would do.
I don't know anything about poetry but I do know you have to put some effort into a prompt and iterate to get anything useful out.
Here are some interesting results from literally feeding this article and prompting chatgpt to improve the poem and take into account her critique. They seem at least to my eyes a little better.
Upon these shores, where time seems still,
PIBO stands, guided by a passioned will.
With hands ungloved, their touch sincere,
They embrace nature, drawing it near.
Loving straightforwardly, without pretense or guise,
In every bird's song, a part of them lies.
In this dance, where boundaries fade and intertwine,
Your journey, dear bird, feels much like mine.
In the quiet of Pelee, where heartbeats merge with the sea's hum,
There lies a tale of dedication, of hands and feathers becoming one.
Upon Pelee's shores, where the migratory journey starts,
PIBO stands undeterred, with gentle, gloveless hands and hearts.
No insect repellant, for it might be toxic, they bear,
Yet with every sting and bite, their dedication's rare.
Warblers greet the dawn, singing their age-old song,
Yet amidst them, mosquitoes and chiggers throng.
With delicate touch, they band, note, and release,
Suffering in silence, so our joy in birdsong doesn't cease.
In the hum of nature and the dance of skies so grand,
Echoes the profound mission, of PIBO's ever-watchful band.
It reads like a someone tried to commit poetry by rhyming dot points in a list. Except they got a discount bulk-rate on purchasing commas at the grammar store so they stitched it all together with 50 of them and called it done :P
Plus like allmost all of its output, it appears to have the amount of profundity, aesthetic pleasure, insight and interest usually found in corporate boardrooms and marketing brochures. That is to say, vanilla dross.
good gracious. I think this poem might be worse than the original.
Indirection, subtlety, and allusion are absolutely critical elements in poetry - maybe even fundamental.
There is not a hint of that in this autocompleted piece of dreck. It's more blatant than the terrible evangelical preaching-"poems" I used to see forwarded on AOL in my teens.
After reading the end of this, I have the feeling I should read some classroom chatGPT's version of 'childrens stories in the style of H.P.Lovecraft'. Halloween is near anyway.
It is a callback to the first paragraph in the article, where Atwood makes a reference to the movie Invasion of the Body Snatchers, in which an alien race breeds clones meant to replace people in society.
I'm quite impressed with "The Weeping Willows of Winnipeg", I'm happy to commend ChatGPT on it more than Atwood does. Sure, Atwood is an acclaimed professional, but hey, I've written dystopian short stories worse than that! Although mine probably involved less plagiarism than ChatGPT's.
I found myself unable to read the whole thing. I'm not sure exactly why, but it was a tedious task to keep reading it, so I skipped past the last half to Atwood's commentary.
Wait, she's being serious about the criticism? It actually seems tongue-in-cheek to me, those are perfectly serviceable bits of text that are an existential threat to mediocre wordslingers the world around.
Are there any turnkey engines designed to run locally which can be trained on your own data? I've been itching to put my work into one, just to see what the results might be.
In the interface, use ExLlama for GPU inference (fast; for smaller models which fit in VRAM). Llama.cpp for large models (higher fidelity but slower), CPU+GPU.
13B parameter 4-bit quantized model (type 'GPTQ") can fit in a 12GB RTX 3060. 24GB card (e.g. a 3090) needed for 30B model on GPU. Something like 5-10 tokens/sec.
Can run 65 or 70B parameter models on CPU (e.g i7 12700) with 64GB RAM (also need decent GPU as above). Around 1 token/sec. These models are type "GGML" / "GGUF".
Long prompts take a long time for initial ingestion on CPU+GPU, much faster on GPU only.
Llama.cpp also apparently runs very well on Apple silicon, with the shared memory between CPU and GPU being well-suited.
It's funny reading articles where the author is directing smug superiority at an ML model. "Haha, Toyota Camry! You may be able to move faster than I can run, but can you move up a ladder?"
That criticism reminds me articles/videos where professional musicians critise some famous hit songs. Musicians sometimes can comment that guitar is ahead of drums or vocal doesn't hit notes quite well. The things that I as not musician doesn't hear at all. And this is also common topic in their reviews, that one should write music for audience, not for other musicians.
I'm not native speaker and many weirdnesses of the text may go past me, but I can say that for me the commented texts (especially the 2nd one, about post apocalyptic Canada) are completely passable and much better that what I will be able ever to write.
Yes, it may be not a threat (yet) to professional, especially established author. But they will be good helpers for people like me, who can get suggestions, improvements and illustrations just for the price of my 4090 and time to tinker with models.
I find your reaction intriguing since I had nearly the opposite reaction. Especially when it comes to the second piece I had little to critique about it technically (maybe a little from a structural standpoint but not as much grammatically), but I found it incredibly dull due to its complete lack of any real point. With your musical analogy, it felt to me like a piece where all of the instruments were played perfectly, but failed to make a song I actually would care to listen to again.
> But they will be good helpers for people like me, who can get suggestions, improvements and illustrations just for the price of my 4090 and time to tinker with models.
This I agree with; for someone who may not be as gifted a writer, but still has something interesting to say, generative models could help with that. I just hope that people don’t lean on these models for generating ideas because if that story was any indication, that’ll just lead to a proliferation of boring, soulless works.
Agree. This reminds me of a kid's school assignment, which sounds impressive, but really misses the point. A kid with no real writing talent or experience given this same prompt would also cargo cult some uncohesive drivel like this.
That lines up pretty well with how much ChatGPT responses ape the stereotypical school essay style of being 1/5th introduction, 3/5ths content, and 1/5ths conclusion that reiterates the content.
Even if there weren’t stylistic and logical problems with the story, I don’t think I’d be terribly interested in reading it.
Fundamentally, literature is about communication with other people, living in another person’s mental world or understanding their unique perspective.
It’s not really clear to me what human value an LLM generated story has. It’s a statistically probable sequence of tokens generated from the distribution of internet-based language. It had no unique perspective, and conveys nothing about actual human experience. What do I learn from that? How is my life enriched?
There's a segment of generative AI proponents who, on a fundamental level, genuinely do not understand art. All art is communication, and without humanity it is meaningless.
LLMs are really cool for creative brainstorming and stuff like that, as a tool for inspiration, but I am baffled by the idea that anyone is interested in entire AI-generated works.
That you are nothing special. The computer is just doing what you do, putting one word after the last one. Picking each word more or less carefully, based on its own training and the audience's expectations.
It's doing it badly, for the moment. But would we mock a talking dog who stutters?
How is my life enriched?
Being able to see what's coming is helpful, more often than not.
You’re describing human communication as “nothing special”, just a string of patterns that have no significance beyond their low entropy.
Frankly that’s solipsistic, bordering on pure nihilism. When I read another person’s writing or talk to them, it enriches my life because it gives me a slice of their experience. When I read LLM output, it just definitionally can’t do that, no matter how plausible and semantically meaningful the words are. What is the purpose of literature for you? Just consuming a pleasing string of words?
That’s not to say they have no value, just that I can’t learn anything about another person’s experience of life through an LLM, because they aren’t people
When it creates a poem, song, essay, painting, or program that has never existed before, it is doing what we do. Whether or not we define that as "creating new information" is not an especially interesting question at the end of the day.
> Yes, it may be not a threat (yet) to professional, especially established author. But they will be good helpers for people like me, who can get suggestions, improvements and illustrations just for the price of my 4090 and time to tinker with models.
I think that's more or less what Atwood was saying: that it's not yet good enough to replace talented professionals. I doubt she'd argue that it can do the other things you mentioned.
She seems to feel the gap is really large right now. Even if it is as bad as she indicates, I wonder how long it will be before the gap is closed given the progress we've seen recently.
The feeling among writers generally (and among companies that thought, for a time, they could fire most of their writers and editors…) is that LLMs need another couple huge upheavals in how they operate, to compete with and replace skilled humans rather than being a 1.1x multiplier for a skilled human. Their current trajectory isn’t making them all that worried, now that they’ve had some time to see what LLM text generators can actually do (which isn’t much).
Musicians write for other musicians. Coders use the best data structure to solve a problem not for the users but for another coder to appreciate. You weigh praise on the merits of the judge to do the judgement. People in the end listen to the music musicians listen to, even if this is with a lag. I am not a musician but I appreciate a musician’s depth of understanding for something I superficially let stroke my ears.
Coders use the best data structure to solve a problem not for the users but for another coder to appreciate.
I find that sentiment mildly offensive. I write a lot of code myself, but for every program I write, I use a hundred programs written by other people. If those programmers want to impress me, they will keep my needs as a user in mind, not their own egos.
While I wouldn't expect Atwood's conclusions to change too much by using GPT-4 instead, I think it's interesting that even the majority of educated people and journalists outside of tech don't seem to realize that the best model is at least 10x smarter than the free version of ChatGPT, which is what they seem to be using for all their prejudice-confirming "experiments".
They also always seem to assume that if the output from whatever prompt they came up with can't reach X quality bar, that means it can't be reached by anyone else either with a different prompting strategy.
Not trying to throw any shade toward Ms. Atwood, who is one of my favorite writers, and I'm also not claiming AI will be writing as well as her anytime soon... just pointing out that if we want to really measure where we're at on tasks like this one, a more rigorous approach is needed.