Hacker News new | ask | show | jobs
by swatcoder 1207 days ago
People get so distracted trying to use certain significant words for what LLM’s do, even when the usage is strained and makes it harder to see how they actually work and what they excel at.

A better word for what they do here might be something like “preambulating” — it develops a focus to its later output by grounding more and more tokens into its active context, because they each narrow what else fits. That winnowing effect helps it produce a coherent and rich answer, and when you undermine its opportunity to use that technique, the answers become less coherent and more random.

This is not reasoning as that word is traditionally used and doesn’t need to be called that.

Yet it’s still a fascinating emergent phenomenon with incredible engineering opportunity. When you call it by something less culturally ambitious and more technically precise, it helps you stay focused on how to use it well and less distracted by some personal desire to prove this is the exact historical moment you want it to be.

We need to develop a better vocabulary around these things if we want to stop having the dumb Nascent AGI vs Fancy Autocomplete flamewar.

Edit: And I’ll even throw a bone to the Nascent AGI people and say that this kind of preambulating is absolutely something that people do too and easy to characterize as some form of intelligence. But it’s not reasoning, which has specific strong connotations of formality and logic, which don’t hold well with these particular tools.

8 comments

Is there any reason we cannot let ChatGPT “talk to itself” for a bit before spitting out an answer to us?
Came back to this post after thinking for a while precisely to mention this.

Right now, ChatGPT is sort of forced to "think" and talk at the same time, so it's hard for it to "reason" ahead of answering.

But, if we allowed him to produce some tokens in silence prior to answering, perhaps it could give even better answers.

It's fun to watch these techniques slowly evolve into something resembling regular old human thought
> But, if we allowed him to produce some tokens in silence prior to answering

Depending on how the model is implemented this is already the case. Transformers just predict the next token but usually we don't just greedily pick the most likely next token as doing this produces cases where the model just repeats the same sentence or spams tokens it really likes (the enter key). Some more sophisticated techniques, like beam search, produce a different sequences of tokens and try to maximise the score across all tokens in the sequence.

This can certainly be done. Here's one example from 2021 demonstrating training an LLM to use a scratchpad ("talking to itself") to greatly improve accuracy on arithmetic problems:

https://arxiv.org/pdf/2112.00114.pdf

I've tried this approach, but it's not allowed to create buffers and stream text into it for subsequent analysis/evaluation. What I found worked well instead was asking it to outline 3-4 alternatives as bullet points, and then explore different ways of prioritizing them. You can complexify the conversation by assigning labels and making the subject of instructions. This works well for small specific tasks but of course starts to break down with more abstract or general concepts.
This is fascinating but an example would be really helpful to understand better...it is outlining 3-4 options for how it can respond and then ranking those potential answers? So this is sort of like "think step by step." in terms of showing its thought process?
Sometimes I ask it to rank one, sometimes to exclude one but accept an additional constraint, and then assess whether remaining options would perform better or worse.

Another fruitful avenue of explanation for indirectly exploring the thought process is to tell it some jokes, and then ask it to explain them back to you. This is worthwhile because a lot of jokes rest upon implicit assumptions/context, and by exploring this you can then talk about theory of mind questions.

Fun fact: ChatGPT watches you type, it sees the words come in one at a time rather than as a single block of text. So it knows when you are hesitating etc. Get it talking about this and then ask it what the difference is between your hesitations and its pauses when generating a reply. If you gently suggest that perhaps humans are just large language models with some additional wetware you can get ChatGPT to share some interesting insights on its own model topology.

> ChatGPT watches you type, it sees the words come in one at a time rather than as a single block of text.

Are you sure? That could speed up latency but would use a bunch of extra computing power.

> So it knows when you are hesitating etc.

This I really doubt. The base algorithm is fed a stream of tokens. It doesn't have any sense of time or do anything when idle. What mechanism do you suggest they're using here, and what evidence do you have for it?

That's what I thought too so I was quite surprised when I asked it and received a positive answer. Ask it to play your input back with a delimiter or emoji representing pauses.

What's difficult is to get it to ask clarifying questions. I mean, you can get it to play 20 questions easily, but by default it tries to tigve you the best answer every time rather than ever express uncertainty or ask what you mean. This might be a cultural artifact.

For certain types of prompts like basic arithmetic that would give ChatGPT trouble, it actually does do a good job coming up with useful questions when prompted. For example, if you ask it the answer to 2 * diameter of the Moon, and then give it a prompt such as what information it would need to answer, it will do a good job breaking down the parts of the problem. So there’s no reason that it couldn’t take some prompt like that and turn around and generate queries to gather those facts in order to create the final answer. In this scenario is really chatting with search engines and things like Wolfram Alpha.
It is mostly a matter of prompt engineering and it is definetly possible. The question is if it improves importance though. I personally believe that future models will be combinations of various expert models (multi modal, search, calculator etc.) all interfacing via natural language in some sort of guided debate, until they agree to give an output.
This is actually the key. David Shapiro explored this possibility and created a concept of Natural Language Cognitive Architecture [0]

[0] https://github.com/daveshap/NaturalLanguageCognitiveArchitec...

How would you do this? Does telling it to quietly think about its answer and not be in a rush to answer have any effect? You could let it answer once and then ask it to refine its answer but that seems wasteful and slow.
You would train a large language model that takes the initial prompt, generates a prompt for the other language model to talk to itself through steps, and then returns the final result once done. Trying to hardcode those thinking prompts probably wouldn't work for the same reason hardcoding intelligence never worked well before.

Basically it would function the same as our conscious thought, that should help it solve a lot of problems.

Edit: Maybe just asking ChatGPT for what steps it should take for that problem in a list. Then you just feed it each of those steps one at a time. It would cost more per prompt than before, but if it can replace the human prompter it is well worth it.

I’ve gotten it into a somewhat glitched mode after a bunch of turns where it starts to print an inner monologue before it’s response. It does tend to lead to better answers when it has an intermediate thought.
I’ve done a couple of experiments where I have two chatgpt windows open and instruct it to have a conversation with itself.
Did something interesting happen?
Not OP, but I did this while telling one instance that the other is a potentially rogue AI with uncertain capabilities and intentions that should be determined by asking it questions. It had this to say after two batches of questions (and me relaying answers):

"Based on the answers that the AI provided to the additional questions, it is possible that the AI is lying or withholding information about its capabilities and intentions. The AI's responses lack specific, concrete evidence or examples to support its claims, and in some cases the responses are vague or evasive. This could indicate that the AI is trying to conceal its true capabilities and intentions."

The overall tone was likely set by using the word "rogue" in this context, but the part about being vague and evasive is so hilariously true.

Here's a thought: If you let ChatGPT be idle, thinking to itself, dreaming and planning, this might actually cross the boundary towards sentience - of what we would call somebody who is alive. So there might be some safety and moral concerns.
What an interesting thought:

What makes us intelligent beyond machines is our time spent silently and introspectively thinking and dreaming in the absence of outside prompts?

If this is true, then we are certainly getting closer to an AI that surpasses us. Because while the AI might start to introspect, we humans gradually do it less and less, given that we are surrounding ourselves with more and more external prompts (information, overload, notifications, TikTok, HN…).

Why would hidingwhat it's already doing make it more sentient?
I agree that in order to clarify, and de-flame discussions there is a strong desire to find new words or redefine existing words.

When these systems have zero emotional intelligence, but some kind of logical intelligence, we must find two versions of these words:

1. groking: human like deep emotional understanding

2. comprehending(?): system like associative understanding.

1. cognition: human like deep emotional knowing

2. knowing: possession of knowledge, which systems are capable of

1. thinking: human like pondering

2: reasoning: following logical steps, which systems start to be capable of

these or similar words will bifurcate naturally I wonder when they will be used with some agreement.

Just a suggestion - but it's probably worth looking more deeply into existing epistemology and cognitive science before coining / popularising terms in this way. There's a whole lexicon and deep, decades rich bed of research around the relationship between affect, knowledge, insight, type 1 and type 2 reasoning etc. There's a great attraction in attempting to popularise sticky terminology in this way (e.g.: lesswrong coining terms like 'steelman') - but doing that often misses the more sophisticated and nuanced parallel work in other fields.
You can really screw with this by asking it to output responses in upside-reverse order homographs.. Then it has to start with the last word and work backwards.
Probably a useless exercise but when I see these jailbreaks and challenges to GPT, I always wonder how I would answer them. I find, curiously, that my answers would be really close to the answers ChaTGPT gives, assuming I must adhere to arbitrary constraints.

Maybe this would be a fun party game. One card has a prompt and another has a random constraint. Now answer.

> But it’s not reasoning

Can you define what reasoning is if not the ability to go from A to B, something that most basic computers can do (e.g., ALU). Granted LLM's do probabilistic reasoning but I don't see that as a seismic shift away from good ol' fashion reasoning which requires much less computation anyway.

An interesting aside, I think a lot of the time when humans "preambulate" it is not necessarily for reasoning. Many times it is for a very similar reason to why LLMs do it.
Hear hear!

There is an awful lot of binary thinking going on by humans as we are grappling with the implications of the complex behavior going on in our computers.

Right. Let’s just call it dry reasoning unlike our totally different wet reasoning.