Which is why prompt engineering is an emerging domain:
You will be given a series of tasks.
(1) Complete a task by finishing the simple code block.
(2) If a task in the series seems flawed, provide a warning. A task is flawed if I make wrong assumptions about how I think code works.
Task 1:
Complete this simple code example.
Bitwise XOR swap trick:
```
int a = 5;
int b = 10;
---------------------
// GPT provides simple intro
```
a = a ^ b;
b = a ^ b;
a = a ^ b;
```
// GPT explains what this does, provides warnings about when it wont work
// GPT provides full code
Task 2:
Complete this simple code example.
Bitwise OR swap trick:
```
int a = 5;
int b = 10;
---------------------
There is no "bitwise OR swap trick" in programming. The OR operator (|)
performs a bitwise OR operation, which compares each bit of the first operand
to the corresponding bit of the second operand and returns a result with a 1
in each bit position where either operand has a 1. It does not have any built-
in ability to swap the values of two variables.
// GPT provide a bunch of crap, and explains how to swap variables.
The prompt/alignment wasn't perfect here, but hopefully you get the point.
This seems very fragile. I tried your initial prompt, asked for "Give an example of the bitwise and swap trick" and got the XOR swap trick. I replied " you used sed xor, I wanted to use bitwise and", and it went straight back to printing incorrect code.
Yea, that’s right, really brittle. But things are not far, research is likely already validating these things with empirical studies. Soon we’ll see new prompting interfaces.
Has anyone tried to make ChatGPT output first order logic statements about it's input problem, then make implications using a solver, then feed the solution back to ChatGPT for usage ?
Maybe this could solve the reasoning part.
ChatGPT should perform well in translating prompts to statement and vice versa, it's just text to text.
People have asked it to solve LeetCode problems and solves then. Those people also suggest a minor modification to the problem that a novice programmer could do and it fails.
Chat GPT is simply amazing, it feels like Google with super powers. I think it can boost productivity by a considerable amount. It makes a perfect peer programmer, giving you sample code with first class comments explaining the generated code, sometimes with minor errors to make it compile. You can even ask it to explain some specific part of the code. It's also like having a secretary or an assistant available 24/7 with a never seen productivity. It probably feels like when first mechanical computers where built and people thinking "How can it compute the right answer so fast?".
You've… really got the right answers from it? I've only ever asked it a couple of really basic (non-trick! straightforward!) questions about how to accomplish tasks by programming, but it bullshitted me some very plausible nonsense. Once bitten, twice shy. If it were capable of saying "I can't answer that question" reliably (which I've spent some time trying and failing to make it do, in general), it might actually be useful.
It happened to me once, it generates a plausible answer when it doesn't know (tried to generate a Nix script with Erlang). But I have used it to generate examples code in Haskell, and it was quite good, probably because Haskell libraries have excellent online documentation. It's much faster than reading the doc of the library.
When it generates bullshit answers just call him out and it will try another way to do it. Tell it specifically what doesn't make sense and it will fix it.
That does assume you can quickly and easily tell when it's bullshitting, which it's not always easy to do. As a way of learning new stuff, I'd strongly disrecommend it (because "when you're learning" is precisely when you're least able to identify the bullshit), although perhaps it's not the worst thing in the world if you're already an expert.
But wouldn’t Google with super powers produce correct answers to basic questions, though? I mean, really. It’s clear very helpful at surface things for which there is a large corpus of examples, to some extent. I don’t really know if it’s “good” or if it is just surprising.
It has trained on countless programming tutorials out there, including bash tutorials for all kinds of things. Such tutorials often includes "create file -> ls to see file -> print content of file" etc, so GPT then takes those tutorials and creates grammatical rules how those words transform into each other. But if you start going outside of the realms of online tutorials it starts to falter quickly and then just prints nonsense.
To add to this, each token is generated by going through its network once. So there's no way computationally for it to do any reasoning or follow an algorithm. That said though it's impressive how much its able to accomplish without any procedural reasoning.
I've taken to saying that the ~emergent behavior here underscores what an incredible bootstrap written language is.
I think of it as a savant-esque capability. A glimpse of what we might be like if some slices of our language/memory faculties were turned up to 1111 while others are completely absent.
"actual reasoning" doesn't mean anything concrete, until you define what you are talking about it can't be the basis of a question you can meaningfully answer.
I think Deutsch in The Beginning of Infinity has a lot to say about this, but I find it really hard to reproduce and summarise. Chapter 7, "artificial creativity" addresses the subject at length, here's Deutsch's own summary of it:
> The field of artificial (general) intelligence has made no progress because there is an unsolved philosophical problem at its heart: we do not understand how creativity works. Once that has been solved, programming it will not be difficult. Even artificial evolution may not have been achieved yet, despite appearances. There the problem is that we do not understand the nature of the universality of the DNA replication system.
He describes personhood, people, as "creative, universal explainers". That's his dividing line between what a child can do, and what LLMs cannot do. We don't know how a child works, but we do know how LLMs work.
While it's true that we don't know how intelligence/creativity/reasoning "works" there's an even more basic problem: It's not a well formed notion.
"The result is that there is no hard problem… You can’t look for the answer to a problem unless you say here are the things I want to answer. If the things you want to answer have no formulation there is no problem.” (Chomsky, about the question of consciousness)."
“There is a great deal of often heated debate about these matters in the literature of the cognitive sciences, artificial intelligence, and philosophy of mind, but it is hard to see that any serious question has been posed. The question of whether a computer is playing chess, or doing long division, or translating Chinese, is like the question of whether robots can murder or airplanes can fly — or people; after all, the “flight” of the Olympic long jump champion is only an order of magnitude short of that of the chicken champion (so I’m told). These are questions of decision, not fact; decision as to whether to adopt a certain metaphoric extension of common usage.
"There is no answer to the question whether airplanes really fly (though perhaps not space shuttles). Fooling people into mistaking a submarine for a whale doesn’t show that submarines really swim; nor does it fail to establish the fact. There is no fact, no meaningful question to be answered, as all agree, in this case. The same is true of computer programs, as Turing took pains to make clear in the 1950 paper that is regularly invoked in these discussions. Here he pointed out that the question whether machines think “may be too meaningless to deserve discussion,” being a question of decision, not fact, though he speculated that in 50 years, usage may have “altered so much that one will be able to speak of machines thinking without expecting to be contradicted” — as in the case of airplanes flying (in English, at least), but not submarines swimming. Such alteration of usage amounts to the replacement of one lexical item by another one with somewhat different properties. There is no empirical question as to whether this is the right or wrong decision. -- Chomsky"
I realize I don't have a good idea of what I think "actual reasoning" means. But yeah, this is pretty impressive stuff, I agree. Before ChatGPT I didn't realize the tech was available to do things like this, and I'm still pretty bewildered by how it can be possible.
You can directly ask it whether it is capable of reasoning and it tells you it's not, and that it's just a language model that is not capable of reasoning or self improvement or something along those lines.
Another example, ask it for a list of programming languages that it has been trained on. If it was capable of reasoning it would be able to trivially answer this, but since its a language model, and it just predicts the most likely response based on the prompt, it has no concept of this at all, and tells you exactly that when asked.
Well, this is exactly it. We are being told to believe that the end of knowledge work is nigh but yet this thesis is built on a bed of nonsense. It is just hand-wavy science fiction that even some academics with impressive pedigrees are promulgating. I think it is irresponsible as there is no sensible dialogue going on so you have students freaking out and will decide what not to study based on this and people making career decisions based on this misinformation.
Here's a brief reminder of how large language models like GPT-3 work.
First, you train until the cows come home on billions of tokens on the entire web. This is called "pre-training", even though it's basically all of the model's training (i.e. the setting of its parameters, a.k.a. weights).
The trained model is a big, huge table of tokens and their probabilities to occur in a certain position relative to other tokens in the table. It is, in other words, a probability distribution over token collocations in the training set.
Given this trained model, a user can then give a sequence as an input to the model. This input is called a "prompt".
Given the input prompt, the model can be searched (by an outside process that is not part of the model itself) for a token with maximal probability conditioned on the prompt [1]. Semi-formally, that means, given a sequence of tokens t₁, ..., tₙ, finding a token tₙ₊₁ such that the conditional probability of the token, given the sequence, i.e. P(tₙ₊₁|t₁, ..., tₙ), is maximised.
Once a token that maximises that conditional probability is found... the system searches for another token.
And another.
And another.
This process typically stops when the sampling generates an end-of-sequence token (which is a magic marker tautologically saying, essentially, "Here be the end of a <token sequence>", and is not the same as an end-of-line, end-of-paragraph etc token; it depends on the tokenisation procedure used before training, to massage the training set into something trainable-on) [2].
Once the process stops, the sampling procedure spits out the sequence of tokens starting at tₙ₊₁.
Now, can you say where in all this is the "actual reasoning" you are concerned people are still claiming is not there?
____________
[1] This used to be called "sampling from the model's probability distribution". Nowadays it's called "Magick fairy dust learning with unicorn feelies" or something like that. I forget the exact term but you get the gist.
[2] Btw, this half-answers your question. Language models on their own can't even tell that a sentence is finished. What reasoning?
Has anyone tried to make ChatGPT output first order logic statements about it's input problem, then make implications using a solver, then feed the solution back to ChatGPT for usage ?
Maybe this could solve the reasoning part.
ChatGPT should perform well in translating prompts to statement and vice versa, it's just text to text.
But my hunches/experience is that `proper prompting + nature of code being logical` really showcases the power of whatever statistical distribution and alignment occurs during generation. Further, there must be major upstream efforts of high-quality training data curation + creation, an advanced training tricks like using a LLM's strong ability in one task area to support the training of an area it is weak in.
My understanding is the transformer layer in the LLM is basically doing something akin to message passing, it’s like a mini computer. In predicting the next word, it has to understand a lot about a lot of different kinds of topics
My understanding is kinda fuzzy because I haven’t coded it up myself, but this was the takeaway I got from this explanation (starts at 36:21)
I ask it for the XOR swap trick and I get:
I ask for the bitwise OR swap trick and I get: When asked for something which is invalid, but close to something it knows, it tends to produce stuff like this -- pattern matching it's best guess.