Hacker News new | ask | show | jobs
by dcrimp 823 days ago
I had this thought the other day that the whole chain of thought reasoning pattern contributing to improved performance in LLM-based systems seems to sit parallel to Kahneman's two-system model of the mind that he covers in 'Thinking, Fast and Slow'.

Haven't read it in a few years, but I recall the book suggests that we use one 'System 1' in our brains primarily for low-effort, low computation thinking - like 1+1=? or "the sky is ____".

It then suggests that we use a 'System 2' for deliberate, conscious, high-cognitive tasks. Dense multiplication, reasoning problems, working with tools - generally just decision-making. Anything that requires focus or brain power. Our brain escalates tasks from S1 to S2 if they feel complex or dangerous.

Maybe I'm being too cute, but it feels like critique that "LLMs aren't intelligent because they are stochastic parrots" is an observation that they are only equipped to use their 'System 1'.

When we prompt an LLM to think step-by-step, we allow it a workspace to write down it's thoughts which it can then consider in it's next token prediction, a rudimentary System 2, like a deliberation sandbox.

We do a similar thing when we engage our System 2 - we hold a diorama of the world in the front of our mind, where we simulate what the environment will do if we proceed with a given action - what our friend might respond to what we say, how the sheet steel might bend to a force, how the code might break, how the tyres might grip. And we use that simulation to explore a tree of possibilities and decide an action that rewards us the most.

I'm no expert, but this paper seems to recognise a similar framework to the above. Perhaps a recurrent deliberation/simulation mechanism will make it's way into models in the future, especially the action models we are seeing in robotics.

16 comments

I'll preface this by saying I know this may sound entirely made up, unscientific, anecdotal, naive, or adolescent even, but luckily nobody has to believe me...

A few weeks back I was in that limbo state where you're neither fully awake nor fully asleep and for some reason I got into a cycle where I could notice my fast-thinking brain spitting out words/concepts in what felt like the speed of light before my slow-thinking brain would take those and turn them into actual sentences

It was like I was seeing my chain of thought as a list of ideas that was filled impossibly fast before it got summarized into a proper "thought" as a carefully selected list of words

I have since believed, as others have suggested in much more cogent arguments before me, that what we perceive as our thoughts are, indeed, a curated output of the brainstormy process that immediately precedes it

Well, this sound weird to me in the sense that I don't feel that I think in _words_. I only convert my thoughts into words when i need to speak or write them down; So when I need to communicate them to others, when I need to remember them for later, or when I am stuck and I need to clear things up.

I was actually convinced it was the same for most people, and that for this reason "Rubber duck debugging"[1] is a thing.

1) https://en.wikipedia.org/wiki/Rubber_duck_debugging

Am I the only one visualizing some of my most creative thoughts in a mental palace that is formed by many distinct (euclidian) spaces, whose axis connect to each other through a graph ? Closest thing that can describe this I found are simplicial sets:

picture: https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcRx5Xam...

It seems it's used by cognitive models, although I'm not formally trained enough to tell exactly how:

https://arxiv.org/pdf/1703.08314.pdf

I wish I had something like this in my head to tie things in together. Right now I feel like my understanding of things is so disorganised and "lucky" in a sense. I feel lucky that I have grasp of anything.
Wow, well expressed. That's exactly hoe i feel. Not momentarily, but with everything. Though i am actually not intelligent, i just have good intuition and luck to grasp some of what i need to "unddrstand".
Reminds me of the saying about a poet vs mathematician, the first gives different names to the same thing and the latter the same name to different things. Maybe that's why I can't stand highly descriptive prose (aka describing the water while I'm drowning over here).

Now what if you're a poetic mathematician (or mathematical poet), what's that mind map look like?

Well... what about that palace of mind thing, and the ability to rewind into almost all older memories at will, and on demand being able to look up things from there, like reading, without having it memorized at all? Also full stream of consciousness, like smells, tastes, light wind on your skin, 'silken air' at just the right temperature and humidity.

All of that arranged in something like 'eigengrau', represented by glitterlike points connected by graphs, mostly in 'phospene' colors, but not exclusively so.

Sometimes very non-euclidean, moving/warping.

KNOWING what's behind every glitter point, like small cinema, large home theatre, from several points of view at the same time.

No words involved. Just visuals.

Thinking, like juggling/weighing blobs, like that glowing stuff which moves slowly up and down in a lava-lamp.

Somehow 'knowing' what each blob, its size/form/viscosity/weight/speed/color/brightness/'feel'/smell represents.

Slowly emerging new 'visuals' from this. Which are then translated into 'language', if ever.

>phosphene color

Not sure whether you talk about the uranium yellow/green color, or the brief hallucination of a light spot (happened to me just a few minutes ago, hadn't had one in a long time).

I don't have such an hyperbolic mental palace, and this doesn't really give me the ability to establish a global map but I relate a lot to what you wrote. Sometimes as I reach the climax of a long deep thought, I'm thinking via vision exclusively to the extent I don't even pay attention to what my outer eye sees and I stumble upon some insight that is sometimes almost impossible to convey in language, not because it lies beyond, but because the intrusion of language causes the idea to collapse: words points to dangling shapes that mean barely anything because the rest of the painting has gone away.

To those that have read this far and can't relate to this way of thinking, this isn't a superpower, those are rather rare experiences of altered states.

Talking about this is a kind of taboo and may cause some smiles, and indeed if there is a deeper truth to these experiences about the computational or geometric nature of the mind, maybe in the same way synaesthesia mirrors spectrograms, it won't help people working in machine learning a lot (even though some like Lecun seem to use their own visual introspective abilities as a source of inspiration).

However they may prove to be crucial in conceiving what kind of use brain chips should be put too. For now it seems we're walking through a thick fog in that direction with envisioned application being confined to interfacing to external computers or increasing cognitive abilities quantitatively, such as perfect memory and so on. If I could sustain such experiences durably, with a high level of control and enhanced geometric/mathematical understanding, I believe this would be akin to a superpower, yes.

Like (parts of) this sort of thing maybe?

https://youtu.be/BLmAV6O_ea0?si=OdPbwBXs6mOR5Xj2

>Now what if you're a poetic mathematician (or mathematical poet), what's that mind map look like?

Well look at the drawings I posted below: mathematical notions mixed with ad-hoc diagrammatic distinctive elements such as colors and marks. With maybe a theorem that posits that every mixed representation like theses matches a colorless, unannotated, rigorous mathematical object ?

In fact I come from a structural linguistics background, and when I pictured how one could extrude a semiotic square into another one, I felt like I understood the vague intuition behind homotopy type theory: the metaphor goes like this – the extrusion volume must be water tight for the squares to make sense.

Suppose you read Dostoyevsky's short story "Another Man's Wife and a Husband Under the Bed." In that case, you might notice that the protagonist's vertical position, as he eavesdrops on what he believes to be his wife through the wall of another man's apartment while standing alone in a corridor, mirrors the horizontal position he later assumes when hiding under the bed of his wife's presumed lover. This physical positioning reflects his moral descent, particularly as he is not alone this time. Beneath the bed with him is another man, clandestinely involved with yet another man's wife. This leads to help us picture that our protagonist is just as disconnected from his wife as the man lying next to him under the bed or the husband unknowingly sleeping above them—if not more so.

Granted I don't have the detailed vision of this semiotic diagram, but coming up with the skeletal structure is exactly what the job of a semiotician consists in (which I'm not). What matters is that all these equivalence classes the writer lays down, just like in mathematics, allows meaning to flow. His vertical loneliness must match his horizontal promiscuity for the story to operate this crescendo. Clog theses connections, and the inner structure of the object they tie together disappear too. Digging into Saussure and Voeivodsy one can realize they shared a common obsession about identity, for it is precisely when physical objects become indistinguishable that they can be referred to with the same terms and that conceptuality arises (Aerts, 2010s and onward).

"Different names to the same thing" and the "same name to different things": the two directions on the homotopical ladder.

Note: I'm 100% in postmodern mode here, this goes way above my head of course.

I don't know what a simpilician set is and wikipedia didn't really helped me. However I could roughly describe my "mind" as many mental maps where concepts are laid out and connected in different ways. Learning means putting new things on these maps a thinking is navigating through them.
This is just a deleuzian metaphor for the weird kind of space I perceive certain abstract thoughts with.

>many distinct (euclidian) spaces, whose axis connect to each other through a graph

Imagine having pictures hanged on the walls of your mental palace that act as portals to others rooms and corridors within that palace, and that must exist parallelly to each other, in different "universes" otherwise their volumes would intersect. The kind of geometry the Antichamber video game features.

Or picture this: a representation that relies on its axis to convey meaning, for instance the political compass meme. Walk along an axis long enough and it will connect orthogonally to another axis, for instance, authoritarianism may connect to anger from the emotional compass.

Simplexes: a generalization of triangles to n dimensions. A 2-axis representation (the political compass for example) could connect to spaces with 3 axis (the ascended political compass: https://external-preview.redd.it/UQgZCVQ4OLg_Hz16FGdu9-qxfq9...).

To represent this you could connect one tip of a segment (a 1-simplex) to the tip of a triangle (a 2-simplex), each vertex in these figures representing an axis. This is where my deleuzian metaphore collapses because I'm conflating the notion of axis with the notion of the "left" and "right" part of an axis. And I'd also be tempted to consider that planes should be allowed to connect to axis (to support that portal through a painting I mentioned above).

So this is just a sketchy thought, but this seems legitimate as it's not something I conceptualize but something I perceive (sometimes). But I think there may be something interesting behind these perceptions because it seems they deal with separate concerns through some kind of orthogonal geometry that is structured: putting a concept in a dimension orthogonal to another concept doesn't lead that dimension to be orthogonal to all other dimensions/concepts in your mental palace, as that would be the case if it took the shape of a n-dimensional space. And because the orthogonality is structured, it allows to deal with more than 3 concepts spatially at the same time and embed them within something your eye can picture in 2D or 3D, using diagrammatic annotations (colors, marks, etc). Finally it allows to put a concept C in several orthogonal relationships to distinct concepts, for instance A and B, and to keep these different instantiations of concept C orthogonal to each other.

This is what my mind pictured as I was explaining this ; colors and graduation marks/boxes faithfully representing what I just perceived: https://pasteboard.co/kMecyenyZdzg.png

Note that the two colors, the green of the axis and of red of the sticks could be thought as two individual concepts of their own, orthogonal to each other.

https://pasteboard.co/3VYEyepnVouQ.png

If a mathematician is reading this, please accept my deepest apologies. Here's another paper that seems thematically related to this: https://ieeexplore.ieee.org/abstract/document/10008602

Really interesting. I could guess that people that "think in words" are more likely to share their thoughts on social media, since they don't need to translate them into text/speech like people that "think in concepts"
I guess from the results of this thread a larger percentage of HN has this condition, but my understanding from reddit threads is that it is quite abnormal. I also lack an internal narrative, and I was quite shocked to find out that most people literally have a voice that they 'hear' internally.
I'll paste my reply to another comment on this thread:

> I could guess that people that "think in words" are more likely to share their thoughts on social media, since they don't need to translate them into text/speech like people that "think in concepts"

So, maybe word-thinker are just over represented in "mainstream" social networks, and concept-thinker are over represented in engineering circles?

Same. If I try to visualize my thoughts it’s like a cloud that coalesces into various forms, to show different scenarios. It definitely isn’t word-based until I decide to actually translate it into that mode.
Interesting. I think all of my thoughts are this record I'm listening to as if it's an audiobook almost. Sometimes, it's like multiple parallel streams of different thoughts at different strengths that I can observe, like a thought line that is going on, on a more subconscious level, and it's something that if I notice, I might want to pay attention to.

Like multiple LLMs are generating tokens in my head in parallel, but like in my field of view, some I can only listen/see barely because I'm not focusing on them.

There is a technique for achieving this state of consciousness, it’s called noting

This is an awareness that advanced meditators seek, practice and develop to perceive “reality as it is”

If you are curious, you might find related discussions, and a great welcoming community at r/streamentry on Reddit

Also the book Mastering the Core Teachings of the Buddha talks about it quite a bit, including instructions on how to do it

Noting is very useful as long as you remember not to do it all the time.
If you don't remember then what? Stack overflow? Heap overflow?
Is this different from Dzoghchen buddhism?
Noting is just a meditation technique

You might also call it an exercise for insight practice

There are multiple traditions that use noting or similar techniques for insight practice (maybe with different names)

Can’t vouch for this thread, as I just found it, but here’s a related discussion (Dzogchen vs Vipassana) https://www.reddit.com/r/Buddhism/comments/9t3095/dzogchen_v...

This is fascinating. I had another experience that I think sheds light on some of this. One day I was in my office and the lights were off. I turned around and looked at the dark shape on top of my coworkers desk. For a few seconds I stared blankly and then suddenly I had a thought: PC, it's his PC. Then I started to think about that period of time just before I realized what I was looking at... The only word I can describe what it felt like is: unconscious. Is it possible that consciousness is just a stream of recognition?
I think it's likely that consciousness is what you call it until you understand how it works.
I have this too. My cognitive processes are not related to my thinking brain, which I define as the part of my mental process which produces the sounds of words in my mind. Instead, I've observed that first, my subconscious processes concepts at a much more fine grained level, much like the latent space of a machine learning model. Only substantially after, let's say 10ms after, do thoughts arise, which are just pointers to the already processed subconscious process. A very rough analogy would be the inference of an LLM in words, vs all the processing of embeddings that happens internally.
I forget the name but I remember reading about this as a recognized process in neurology. We usually only hear the thought that wins, but there are many generated simultaneously, and there is a selection process.

Possibly related, I had a similar experience last night, where my mind simulated a fully realistic conversation between two people, with audio and video, except that the sentences made no sense. I thought that was interesting. My explanation was "the language part of your brain is too tired cause you've been using it all day."

Hm, interesting... i struggle with people understanding what i mean with having too many thoughts in parallel. I thought that's what adhd is, but turns iut, it's not. But i don't have a winning thought. I have to fight many of them & "pick" the winner if you will. People always take it as a figure of speech, but i honestly struggle with it. It's not rare that i can just sit quietly and after a few hours i am exhausted when finally having finished thinking.

If you remember the official name, please let me know. I'd love to look into it more.

> I got into a cycle where I could notice my fast-thinking brain spitting out words/concepts in what felt like the speed of light before my slow-thinking brain would take those and turn them into actual sentences

The way I’ve seen this described by psychologists is that System 1 is driving the car while System 2 panicks in the back seat screaming out explanations for every action and shouting directions to the driver so it can feel in control. The driver may listen to those directions, but there’s no direct link between System 2 in the backseat and System 1 holding the wheel.

Various experiments have shown that in many situations our actions come first and our conscious understanding/explanation of those actions comes second. Easiest observed in people with split brain operations. The wordy brain always thinks it’s in control even when we know for a fact it couldn’t possibly have been because the link has been surgically severed.

Being super tired, on the edge of sleep, or on drugs can disrupt these links enough to let you observe this directly. It’s pretty wild when it happens.

Another easy way, for me, is to get up on stage and give a talk. Your mouth runs away presenting things and you’re in the back of your head going “Oh shit no that’s going in the wrong direction and won’t make the right point, adjust course!”

Sometimes when I am in a Teams call, I observe myself talking. I know for myself that I can get carried away whilst talking and that time passes faster then. My conscious self sometimes needs to interrupt my talky self with a 'nough explained signal, or even with a 'nough joking signal.

I read several studies that show that brains don't have a central point of command, so our true self can not exist (as one single origin). We are the sum of all our consciousnesses, similar to how a car is the sum of its parts.

Oh, yes, that's what I do! I act first, and then consider the action.
It’s hard (impossible?) to know if we’re talking about the same thing or not, but I experience something like this all the time, without being on the edge of sleep. We might both be wrong, but it’s relatable!
This seems like it might upend Descartes' "cogito, ergo sum" ("I think therefore I am") in that the process for forming thoughts in a language is not indicative that we exist, rather it merely indicates that we have evolved a brain that can produce and interpret language.

Seems like we're dismantling a lot of what Descartes came up with these days.

For that I came up (or got inspired from somewhere) with this: I'm aware therefore I exist. Pure awareness, devoid of all objects (thoughts/visualization) is me.
From positive perspective,it is surely that our thinking/mind is not just language and always faster than sentence formation.
I had a similar experience when I was put under during surgery a few years ago. Later I learned that they used ketamine in their concoction.
I occasionally reach a similar state near sleep where I will be half-dreaming that I'm reading from a page of a book where the words materialize/"come into focus" right before my eyes into what is usually vaguely grammatically correct nonsense.
> curated output of the brainstormy process that immediately precedes it

Daniel Dennett gives a nice albeit more detailed version of your idea in his book Consciousness Explained, could be worth a read

Mandelthought psyt.
> it feels like critique that "LLMs aren't intelligent because they are stochastic parrots" is an observation that they are only equipped to use their 'System 1'.

I wouldn't say LLMs aren't intelligent (at all) since they are based on prediction which I believe is the ability that we recognize as intelligence. Prediction is what our cortex has evolved to do.

Still, intelligence isn't an all or nothing ability - it exists on a spectrum (and not just an IQ score spectrum). My definition of intelligence is "degree of ability to correctly predict future outcomes based on past experience", so it depends on the mechanisms the system (biological or artificial) has available to recognize and predict patterns.

Intelligence also depends on experience, minimally to the extent that you can't recognize (and hence predict) what you don't have experience with, although our vocabulary for talking about this might be better if we distinguished predictive ability from experience rather than bundling them together as "intelligence".

If we compare the predictive machinery of LLMs vs our brain, there is obviously quite a lot missing. Certainly "thinking before speaking" (vs LLM fixed # steps) is part of that, and this Q* approach and tree-of-thoughts will help towards that. Maybe some other missing pieces such as thalamo-cortical loop (iteration) can be retrofitted to LLM/transformer approach too, but I think the critical piece missing for human-level capability is online learning - the ability to act then see the results of your action and learn from that.

We can build a "book smart" AGI (you can't learn what you haven't been exposed to, so maybe unfair to withhold the label "AGI" just because of that) based on current approach, but the only way to learn a skill is by practicing it and experimenting. You can't learn to be a developer, or anything else, just by reading a book or analyzing what other people have produced - you need to understand the real world results of your own predictions/actions, and learn from that.

Defining intelligence as prediction leaves out a lot of other things that humans would see as intelligence in other humans (e.g., creating a novel), also quite simple organisms make predictions (e.g., a predator jumping at prey makes a prediction about positions).
>Defining intelligence as prediction leaves out a lot of other things that humans would see as intelligence in other humans (e.g., creating a novel)

Would it?

Why would "creating a novel" by a human not itself be text generation based on prediction on what are the next good choices (of themes, words, etc) based on a training data set of lived experience stream and reading other literature?

What is the human predicting there? Why would it need to be a prediction task at all? How about a dada-ist poem? Made-up words and syntax? If it is prediction but the criterion for "what is a good next choice" can totally be made up on the fly - what does the word "prediction" even mean?
>What is the human predicting there?

Their next action - word put on page, and so on.

>Why would it need to be a prediction task at all?

What else would it be?

Note that prediction in LLM terminology doesn't mean "what is going to happen in the future" like Nostradamus. It means "what is a good next word given the input I was given and the words I've answered so far".

>How about a dada-ist poem? Made-up words and syntax?

How about it? People have their training (sensory input, stuff they're read, school, discussions) and sit to predict (come up with, based on what they know) a made-up word and then another.

That is a meaningless definition of prediction if "what is a good next word" has an ever changing definition in humans (as everything would fulfill that definition).
> Why would "creating a novel" by a human not itself be text generation based on prediction on what are the next good choices (of themes, words, etc) based on a training data set of lived experience stream and reading other literature?

Unless you're Stephen King on a cocaine bender, you don't typically write a novel in a single pass from start to finish. Most authors plan things out, at least to some degree, and go back to edit and rewrite parts of their work before calling it finished.

That can be expressed as text prediction. You output version 1 then output editing instructions or rewritten versions until you're done.

The real issue is running out of the input window.

> The real issue is running out of the input window.

isn't this what abstractions are for? you summarise the key concepts into a new input window?

Maybe a better way to say it rather than "intelligence is prediction" is that prediction is what supports the behaviors we see as intelligent. For example, prediction is the basis of what-if planning (multi-step prediction), prediction (as LLMs have proved) is the basis of leaning and using language, prediction is the basis of modelling other people and their actions, etc. So, ultimately the ability to write a novel, is a result of prediction.

Yes, an insect (a praying mantis, perhaps) catching another is exhibiting some degree of prediction, and per my definition I'd say is exhibiting some (smallish) degree of intelligence in doing so, regardless of this presumably being a hard-coded behavior. Prediction becomes more and more useful the better you are at it, from avoiding predators, to predicting where the food is, etc, so this would appear to be the selection pressure that has evolved our cortex to be a very powerful prediction machine.

I think you're confusing prediction with ratiocination.

I'm sure you've deducted hypothesis' based solely on the assertion that "contradiction and being are incompatible". Note, there wasn't prediction involved on that process.

I consider prediction as a subset of reason, but not the contrary. Therefore, I beg to differ on the whole assumption that "intelligence is prediction". It's more than that, prediction is but a subset of that.

This is perhaps the biggest reason for the high computational costs of LLM's, because they aren't taking the shortcuts necessary to achieve true intelligence, whatever that is.

> I think you're confusing prediction with ratiocination.

No, exactly not! Prediction is probabalistic and liable to be wrong, with those probabilities needing updating/refining.

Note that I'm primarily talking about prediction as the brain does it - not about LLMs, although LLMs have proved the power of prediction as a (the?) learning mechanism for language. Note though that the words predicted by LLMs are also just probabilities. These probabilities are sampled from (per a selected sampling "temperature" - degree of randomness) to pick which word to actually output.

The way the brain learns, from a starting point of knowing nothing, is to observe and predict that the same will happen next time, which it often will, once you've learnt what observations are appropriate to include or exclude from that prediction. This is all highly probabalistic, which is appropriate given that the thing being predicted (what'll happen if I throw a rock at that tiger?) is often semi-random in nature.

We can better rephrase "intelligence is ability to predict well", as "intelligence derives from ability to predict well". It does of course also depend on experience.

One reason why LLMs are so expensive to train is because they learn in an extremely brute force fashion from the highly redundant and repetitive output of others. Humans don't do that - if we're trying to learn something, or curious about it, we'll do focused experiments such as "Let's see what happens if I do this, since I don't already know", or "If I'm understanding this right, then if I do X then Y should happen".

The ability to write a novel is different from actually writing a novel. If prediction forms the basis of (at least some forms of) intelligence, intelligence itself is more than prediction.
That's why I say our vocabulary for talking about these things leaves something to be desired - the way we use the word "intelligence" combines both raw/potential ability to do something (prediction), and the experience we have that allows that ability to be utilized. The only way you are going to learn to actually write a novel is by a lot of reading and writing and learning how to write something that provides the experience you hope it to have.
Kind of agree. I think, though, trying to shoe-horn intelligence into some evolutionary concepts is tricky because it is easy stack hypotheses there.
>The ability to write a novel is different from actually writing a novel

In what way, except as in begging the question?

Which LLM will on its own go and write a novel? Also, even for humans, just because you technically know how to write a novel, you might fail at it.
LLMs have shown that writing a novel can be accomplished as an application of prediction, at least to a certain level of quality.
I have yet to see an LLM write a novel on its volition.
> online learning - the ability to act then see the results of your action and learn from that.

I don't think that should be necessary, if you are talking about weight updates. Offline batch mode Q-learning achieves the same thing.

By online learning, did you mean working memory? I'd agree with that. Whether it's RAG, ultra-long-context, and LSTM-like approach, or something else, is TBD.

By online learning I mean incremental real-time learning (as opposed to pre-training), such that you can predict something (e.g. what some external entity is going to do next, or the results of some action you are about to take), then receive the sensory feedback of what actually happened, and use that feedback to improve your predictions for next time.

I don't think there is any substitute for a predict-act-learn loop here - you don't want to predict what someone else has done (which is essentially what LLMs learn from a training set), you want to learn how your OWN predictions are wrong, and how to update them.

> By online learning I mean incremental real-time learning, such that you can predict something (e.g. what some external entity is going to do next, or the results of some action you are about to take),

I used to believe this, but the recent era of LLMs has changed my mind. It's clear that the two things are not related: you don't need to update weights in real-time if you can hold context another way (attention) while predicting the next token.

The fact that we appear to remember things with one-shot, online training might be an illusion. It appears that we don't immediately update the weights (long term memory), but we store memories in short term memory first (e.g. https://www.scientificamerican.com/article/experts-short-ter...).

The fundamental difference is that humans do learn, permanently (eventually at least), from prediction feedback, however this works. I'm not convinced that STM is necessarily involved in this particular learning process (maybe just for episodic memories?), but it makes no difference - we do learn from the feedback.

An LLM can perform one-shot in-context learning, which in conversational mode will include (up to context limit) feedback from it's actions (output), but this is never learned permanently.

The problem with LLMs not permanently learning from the feedback to their own actions is that it means they will never learn new skills - they are doomed to only learn what they were pre-trained with, which isn't going to include the skills of any specific job unless that specific on-the-job experience of when to do something, or avoid doing it, were made a part of it. The training data for this does not exist - it's not the millions of lines of code on GitHub or the bug fixes/solutions suggested on Stack Overflow - what would be needed would be the inner thoughts (predictions) of developers as they tackled a variety of tasks and were presented with various outcomes (feedback) continuously throughout the software development cycle (or equivalent for any other job/skill one might want them to acquire).

It's hard to see how OpenAI or anyone else could provide this on-the-job training to an LLM even if they let it loose in a programming playground where it could generate the training dataset. How fast would the context fill with compiler/link errors, debugger output, program output etc ... once context was full you'd have to pre-train on that (very slow - months, expensive) before it could build on that experience. Days of human experience would take years to acquire. Maybe they could train it to write crud apps or some other low-hanging fruit, but it's hard to see this ever becoming the general purpose "AI programmer" some people think is around the corner. The programming challenges of any specialized domain or task would require training for that domain - it just doesn't scale. You really need each individual deployed instance of an LLM/AI to be able to learn itself - continuously and incrementally - to get the on-the-job training for any given use.

> but this is never learned permanently.

Are you sure? I think "Open"AI uses the chat transcripts to help the next training run?

> they are doomed to only learn what they were pre-trained with

Fine-tuning.

> The training data for this does not exist

What does "this" refer to? Have you read the Voyager paper? (https://arxiv.org/abs/2305.16291) Any lesson learnt in the library could be used for fine-tuning or the next training run for a base model.

> what would be needed would be the inner thoughts (predictions) of developers as they tackled a variety of tasks and were presented with various outcomes (feedback) continuously throughout the software development cycle

Co-pilot gets to watch people figure stuff out - there's no reason that couldn't be used for the next version. Not only does it not need to read minds, but people go out of their way to write comments or chat messages to tell it what they think is going on and how to improve its code.

> Days of human experience would take years to acquire

And once learnt, that skill will never age, never get bored, never take annual leave, never go to the kids' football games, never die. It can be replicated as many millions of time as necessary.

> they could train it to write crud apps

To be fair, a lot of computer code is crud apps. But instead of learning it in one language, now it can do it in every language that existed on stackoverflow the day before its training run.

Id say intelligence is a measure of how well you can make use of what you have. An intelligent person can take some pretty basic principles a really long way, for example. Similarly, they can take a basic comprehension of a system and build on it rapidly to get predictions for that system that defy the level of experience they have. Anyone can gather experience, but not everyone can push that experience's capacity to predict beyond what it should enable.
To me, it is one of those things like defining what 'art' is, as in creating a model in our heads around a concept. We take our definitions and then use those to construct models like AI that simulate our model well enough.

In other words, I personally do not believe any system we develop will be truly 'intelligent', since intelligence is a concept we created to help explain ourselves. We can't even truly define it, but yet we try to test technologies we develop to see if they possess it. It is a bit non sensical to me.

Sure, we created the word intelligence to help describe ourselves, and our differing levels of ability, as well as applying it to animals such as apes or dogs that we see seem to possess some similar abilities.

However, if we want to understand where this rather nebulous ability/quality of "intelligence" comes from, the obvious place to look is our cortex, which it turns out actually has rather simple architecture! If uncrumpled our cortex would be a thin sheet about the size of a tea towel, and consists of six layers of neurons of different types, with a specific pattern of connectivity, and including massive amounts of feedback. We can understand this architecture to be a prediction machine, which makes sense from an evolutionary point of view. Prediction is what lets you act according to what will happen in the future as opposed to being stuck in the present reacting to what is happening right now.

Now, if we analyze what capabilities arise from an ability to predict, such as multi-step what-if planning (multi-step prediction), ability to learn and use language (as proven by LLMs - a predict-next-word architecture), etc, etc, it does appear (to me at least!) that this predictive function of the cortex is behind all the abilities that we consider as "intelligence".

For sure there is very little agreement on a definition of intelligence, but I have offered here a very concrete definition "degree of ability to predict future outcomes based on past experience" that I think gets to the core of it.

Part of the problem people have in agreeing on a definition of intelligence is that this word arose from self-observation as you suggest, and is more a matter of "i know it when i see it" rather than having any better defined meaning. For technical discussion of AI/AGI and brain architecture we really need a rigorously defined vocabulary, and might be better off avoiding such a poorly defined concept in the first place, but it seems we are stuck with it since the word is so entrenched and people increasingly want to compare machines to ourselves and judge whether they too have this quality.

Of course we can test for intelligence, in ourselves as well as machines, by using things like IQ tests to see the degree to which we/they can do the things we regard as intelligent (we'd really need a much deeper set of tests than a standard IQ test to do a good job of assessing this), but the utility of understanding what is actually behind intelligence (prediction!) is that this allows us to purposefully design machines that have this property, and to increasing degrees of capability (via more powerful predictive architectures).

I think that is my overall point though - we created a system (AI) based on how we see one aspect of a particular organ or system (brain, cortex, etc.), and, in this case, labeled intelligence as 'predictive behavior', and so develop systems after that model. But for starters, only mammals and a few other life branches have cortexes, and cortexes weren't always around.

Evolutionary theory isn't hinged on prediction in itself, it's just one possible aspect of it. But, organisms that rely on prediction or primarily see themselves as predictive machines will state the opposite, because we cannot do anything else but model off what we think we know.

It is also further diluted in the sense that we are always limited in what we can model because of the digital nature of our medium as it attempts to model analog systems. It is like saying that the words that I am typing right now are just like having a real human conversation. No, not really. It is a diluted form of conversation that focuses on a specific, bare part of the communicative process.

I don't think people are, yet, deliberately creating predictive machines because they see that as the path to intelligence. Things like ChatCPT are LLMs, born out of that (language model) line of research, where the goal has been to learn the rules of language. The fact that a language model, when made large enough, appears somewhat intelligent was an unexpected surprise.

Different species have evolved to have different capabilities. Humans have evolved to be generalists, able to survive in a huge variety of environments, which requires a high degree of adaptability. The key to adaptability is prediction - the ability to very rapidly (in space of minutes/hours/days - not evolutionary timescales) learn how things work in a new environment or in new conditions.

Not all animals need this degree of adaptability, since they have been able to survive and thrive in long-lasting stable environments. Examples might be crocodiles or sharks - very low intelligence, but great at what they do. Evolution is not generally about prediction or intelligence - it's about optimizing each species for their own environment(s).

We already know how to build machines that are more like crocodiles - great at doing one thing over and over, but now we have the capability and desire to also build machines that are generalists like ourselves, and that requires us to figure out a way how to implement intelligence. Given how hard a problem this has been (and continues to be) to solve, it makes sense to look at our brains for inspiration - where does our own intelligence come from, and it's highly notable that the part of our brain that most differentiates humans from other animals - our large neo-cortex - appears to be a prediction machine ... In studying humans no-one is saying that other animals are the same - it's just that humans are the animal who's capabilities we are trying to reproduce.

As I said, LLMs being intelligent was an accidental discovery - they were expected just to be language models, but it's certainly notable that the only thing they are trained to do is predict next word. They only do one thing, predict, and they exhibit unexpected intelligence, hmmm ...

At this point people are NOT yet all saying "prediction is the key to intelligence, so let's build predictive machines and assume they will be intelligent", but when you look at our cortex and look at LLMs, that does appear to be the obvious direction.

In this case I would say AI is the crocodile, the same as all life is. It's specializing (or becoming specialized) in something, which is prediction, in the same way a human (or any life that shows the same definitions of intelligence as us, like a crow solving a puzzle) can show success in a new or novel situation. But life does not need this definition of intelligence to survive, which leads to the basis of evolutionary theory. The trait of adaptability/prediction/intelligence is not always useful given a niche and can get weeded out, which is why most life does not need it, yet they are still around. In organisms that do possess it, it can be a detriment as well given specific situations (over analyzing, stuck in anxiety, excessive risks to adapt, etc.).

In other words, when we say an LLM is becoming intelligent, it's not that it is in the general sense. It's that we recognize the traits within it because the traits make sense to us and mimic what we define ourselves in terms of specializing, because quite obviously, we made it and provide its data input. But, the key difference is that AI has none of the original impetus or evolutionary pressures that led to our own ability to generalize/specialize. This is because its output is derived from human input, which is fed through it through digitized means, which means there is always some kind of 'loss' since it is a specialized aspect of us.

It is why I made the reference to typing. We are communicating right now, but at the same time, it is a specialized form of it. It is not the full original human experience of talking to one another, but does not have to be in this case, because it works well enough and has some advantages given the niche. If we were using Facetime, it would be much closer, but still not quite the same as being in the same room face-to-face.

In my opinion, we are not so much prediction machines, but rather mimickers who can also create mimics of themselves via what we can make. You do not need to be able to predict that well if you can just mindlessly copy something that succeeded somehow.

Andrej Karpathy makes this same point, using the same book reference, in his "[1hr Talk] Intro to Large Language Models" video from Nov. 2023.

Here is a link to the relevant part of his presentation: https://youtu.be/zjkBMFhNj_g?t=2120

Wasn't most of the claims in that book refuted, some even by the author. I really enjoyed it and found some great insights only to be later told by a friend in that sphere that the book was not correct and even the author had "retracted" some of the assertions.
It might still be a useful concept in developing LLMs.
He won a Nobel prize for his works so not sure how much of it would be refuted
One quick google search and you can find multiple links for that, including some that were posted here. wasn't proven to be false but that the evidence used was not much of evidence either.

here the first one in my results:

https://retractionwatch.com/2017/02/20/placed-much-faith-und...

Cunningham's Law states "the best way to get the right answer on the internet is not to ask a question; it's to post the wrong answer."

https://meta.wikimedia.org/wiki/Cunningham%27s_Law

As luck would have it, a System 1 vs System 2 scenario falls into our laps.
People often say that LLMs aren't really thinking because they are just producing a stream of words (tokens really) reflexively based on some windows of previous text either read or from its own response. That is true.

But I have the experience when talking of not knowing what I'm going to say until I hear what I've said. Sometimes I do have deliberative thought and planning, trialing phrases in my head before uttering them, but apparently I'm mostly an LLM that is just generating a stream of tokens.

This is something that is easily observable by anyone at virtually any moment, yet at the same time is something that escapes 99% of the population.

When you are talking to someone in normal conversation, you are both taking in the words you are saying at the same time.

I'm currently reading it for the first time, completely coincidentally/not for this reason, and on a few occasions I've thought 'Gosh that's just like' or 'analogous to' or 'brilliant description of that problem' for LLMs/generative AI or some aspect of it. I wish I could recall some examples.
I think of COT as a memory scratchpad. It gives the LLM some limited write-only working memory that it can use for simple computations (or associations, in its case). Now suppose an LLM had re-writeable memory... I think every prompt-hack, of which COT is one example, is an opportunity for an architecture improvement.
I think of COT more as a type of planning or thinking before you speak. If you just open your mouth and start talking, which is what a plain LLM does, then you may talk yourself into a corner with no good way to get out of it, or find yourself saying something that really makes no sense. COT effectively allows the LLM to see the potential continuations of what it is considering saying, and pick one that makes sense!

I think lack of COT or any ability to plan ahead is part of why LLMs are prone to hallucinate - if you've already run your mouth and said "the capital of australia is", then it's a bit late to realize you don't know what it is. The plain LLM solution is to do what they always do and predict next word using whatever it had in the training set, such as names of some australian cities and maybe a notion that a capital should be a large important city. IOW it'll hallucinate/bullshit a continuation word such as "Melbourne". With COT it would potentially have the ability to realize that "the capital of australia is" is not a good way to start a sentence when you don't know the answer, and instead say "i don't know". Of course the other cause of hallucinations is that the LLM might not even know what it doesn't know, so might think that "Melbourne" is a great answer.

Feel like this is better represented as the default mode network: https://en.m.wikipedia.org/wiki/Default_mode_network

There are questions we know the answers to and we just reflexively spit them out, but then there are questions that are new to us and we have to figure them out separately.

Recent research has shown that new memories are recorded in the brain differently depending on how unique the memory is: https://www.quantamagazine.org/the-usefulness-of-a-memory-gu...

I have a similar view to you and not much to add to your comment, other than to reference a couple books that you might like if you enjoyed 'Thinking, Fast and Slow'.

'The Righteous Mind' by Jonathan Haidt. Here, Haidt describes a very similar 2-system model he describes as the Elephant-rider model.

'A Thousand Brains: A New Theory of Intelligence' by Jeff Hawkins. Here Jeff describes his Thousand Brains theory, which has commonality with the 2-system model described by Kahneman.

I think these theories of intelligence help pave the way for future improvements on LLMs for sure, so just want to share.

How does evolutionary instinct factor into the system model? Flight or fight responses, reflexes, etc. 'Thinking' does have consequences in terms of evolutionary survival in some circumstances, as in spending too much time deliberating\simulating.
This is a common comparison in the LLM world. I actually think it is closer to the Left/Right Brain differences described in Master and His Emissary, but that’s for a blog post later.
This sounds similar to the A Brain/B Brain concept that was described by, I believe, Marvin Minsky. I don't know how this might be related to Kahneman's work.
I had the same thought from Thinking, Fast and Slow.

Another variation of this seems to be the “thought loop” that agents such as Devin and AutoGPT use.

It’s a bit over my head for now but seems like GFlowNets are tackling this problem a bit.
interesting, hadn't come across these. Will be doing some more reading up on them.
that is the approach also taken in this paper for building LLM agents with metacognition: https://replicantlife.com/
thinking step-by-step requires 100% accuracy in each step. If you are 95% accurate in each step, after the 10th step, the accuracy of the reasoning chain drops to 59%. this is the fundamental problem with llm for reasoning.

reasoning requires deterministic symbolic manipulation for accuracy. only then it can be composed into long chains.

You’ve never made a mistake in your reasoning?

Tongue in cheek but this has been considered and has resulted in experiments like tree of thought and various check your work and testing approaches. Thinking step by step is really just another way of saying make a plan or use an algorithm and when humans do either they need to periodically re-evaluate what they’ve done so far and ensure it’s correct.

The trick is training the model to do this as a matter of course and to learn which tool to apply at the right time which is what the paper is about wrt interspersed thoughts.

>reasoning requires deterministic symbolic manipulation for accuracy

No, that is automation. Automated reasoning is a thing, indeed. And I can kind of see a world where there is a system which uses LLM for creative thinking, augmented with automated reasoning systems (think datalog, egg, SMT-solver, probabilistic model checking etc).

I dream of a world where the majority of humans could come close to 59% after attempting a ten step logical process.
wut

the average theorem in euclids' elements (written 2000 years back) would have a reasoning chain of at least 10 steps.

all of the mathematical machinery humans build need 100% accuracy in each step

all human knowledge is created by a small number of people. most of us just regurgitate and use it.

think euclid, galileo, newton, maxwell, etc...

and all human knowledge is mathematical in nature (galileo said this).

what is meant here is that, facts and events in the world we perceive can be compressed into small models which are mathematical in nature and allow a deductive method.

human genius comprises of coming up with these models. This process is described by Peirce (and Kant before him) ie, inventing concepts and relations between them to comprise models of the world we live in.

imagine compressing all observed motion into a few equations of physics. or compress all electromagnetic phenomena into a few equations. and then use this machinery to make things happen.

imagine if we feed a lot of perceived motion data into a giant black-box (which could be a neural net) - and out comes a small model of that data comprising newton's equations (and similarly maxwellian equations).

But, this giant knowledge edifice is built on solid foundations of mathematical reasoning (newton said this).

human genius is to invent a mathematical language to describe imaginary worlds precisely, and then a scientific method to apply that language to model the real world.