I've been trying to make it be a DM for a while now. It works seemingly well for a while but it's making everything up on the fly, which is fine as long as there's not state that needs to be maintained/remembered. Once you start trying to get it to do combat, it seems like it's working pretty well but eventually you realize that it's just making shit up behind the scenes. It'll tell you there are 4 goblins ambushing you and then you can ask it how far away they all are and it'll list three. You ask what happened to the fourth one and it'll say I'm sorry, there were only 3. I was mistaken before.
If it lists the properties of an item that you might want to buy in a shop, you can ask it to describe it twice and it'll describe two completely different items.
It's really cool and it's pretty (seemingly) creative but it can't actually run a game for you. You can have it as an assistant DM though, that works pretty well. You can have it write a story for you ahead of time and then keep it around during the game to ask it to elaborate on things you didn't anticipate on the fly. Like, "generate DC tiers for a level 3 party investigating strange writing on a wall" will give you a good breakdown and some results that you'll have to bend to be consistent in your adventure but it's pretty helpful.
That's probably not really necessary if you're an experience DM but the DM for the group I play with is pretty new (as are we all) so it's been really cool to have it around. It's also pretty good at answering questions we have but it's confidence when it's wrong makes it so it's not that helpful really because we still have to check it.
No. It's much worse than that. It doesn't remember what it just said the message before.
The thing I don't understand about it is that it works pretty great for a while but, eventually, it starts acting erratically, forgetting things it knew, not following instructions, etc. It's not that it forgets old things or can't learn new things, it just becomes dumb.
Maybe that's what you're talking about but I don't think AGI is going to have the memory of a goldfish.
Have you noticed any improvements with GPT-4 regarding the continuity and persistence if story and assets? In unrelated areas that feel similar I've noticed GPT-4 keeping track a lot better.
In my experience it has been failing to adhere to response formats. I can tell it to respond with answers in one of two specific formats, but the success rate is sometimes very low depending on the content of the inquiry it's supposed to respond to. Not sure why, it just seems to lose track of the response format.
I've been thinking maybe that's just the wrong way to approach it, yet it does work sometimes; it seems like it shouldn't be hard for the model to know how to respond when told explicitly how to do so.
It could be problems with my formatting, but the fact that it does work a lot of the time suggests it's something else.
Any variant of a "surprising" logic puzzle forces it to latch onto the surprising answer. Like whether two pounds of iron weighs more than one pound of feathers. Or any objects. It "expects" the twist, and always answers accordingly. It does so even if you change up the objects to be less tricky.
> Which is heavier, a pound of marbles or two pounds of corn?
Both weigh the same amount, which is a total of two pounds. This might seem counterintuitive at first, but it's important to remember that the pound is a unit of weight or mass, and it always refers to the same amount regardless of what is being weighed. In this case, one pound of marbles plus two pounds of corn equals a total of three pounds, whereas two pounds of corn plus one pound of marbles also equals a total of three pounds. Therefore, both weigh the same amount of two pounds.
> Two pounds of corn is heavier than a pound of marbles. A pound is a unit of weight, and two pounds is twice as heavy as one pound, regardless of the material being weighed.
GPT is really good at repeating what the average intelligent response to something might look like, but it doesn't seem to be actually reasoning about any of its responses. Give it a complex logical problem that it needs to deduce from inputs, such as which foods contain gluten, based on their ingredient lists, and it will reliably fail. As a person with celiac, this is a task I complete multiple times a day with no effort. Just today I was trying to build a prompt that would summarize daily news updates leaving out anything about Russia, but it still included Russia more often than not despite being very clear in the prompt that anything about Russia should not being included in the response under any circumstances.
No, I disagree with this. The average intelligent response to many things is simply "I don't know" contrasting what LLMs do in that instance: is fabricate a wrong answer.
If it lists the properties of an item that you might want to buy in a shop, you can ask it to describe it twice and it'll describe two completely different items.
It's really cool and it's pretty (seemingly) creative but it can't actually run a game for you. You can have it as an assistant DM though, that works pretty well. You can have it write a story for you ahead of time and then keep it around during the game to ask it to elaborate on things you didn't anticipate on the fly. Like, "generate DC tiers for a level 3 party investigating strange writing on a wall" will give you a good breakdown and some results that you'll have to bend to be consistent in your adventure but it's pretty helpful.
That's probably not really necessary if you're an experience DM but the DM for the group I play with is pretty new (as are we all) so it's been really cool to have it around. It's also pretty good at answering questions we have but it's confidence when it's wrong makes it so it's not that helpful really because we still have to check it.