Hacker News new | ask | show | jobs
by killerstorm 1188 days ago
GPT-4 (as well as all GPTs before it has a limitation): it has to produce an output in a single pass. It cannot pause and think, it cannot backtrack. So yes, it makes weird mistakes sometimes.

A human programmer will need to look at code, then think a bit, then look at it again, etc.

You can put programmers into a similar situation: try reading code aloud instead of showing it to them on screen. If they can't answer right, does it mean they aren't intelligent? Intelligence =/= never making a mistake.

Now that you know GPT's limitations, perhaps you would consider asking questions one at a time instead of intentionally trying to confuse it. Considering multiple questions in a single pass increases error rate.

> so probably any question you ask it has already been asked somewhere and any useful code you present to it or ask it to generate is included in its training data

The "it just recites training data" idea is demonstrably false. Do a bit of combinatorics. Or write a unique piece of code which is not a gotcha question and try it.

2 comments

GPT is a transformer model. Transformers use the attention mechanims. The mechanism is entirely concerned with retaining semantic context and semantic "global dependencies" spanning the entire input and output.

https://ar5iv.labs.arxiv.org/html/1706.03762

"Attention mechanisms have become an integral part of compelling sequence modeling and transduction models in various tasks, allowing modeling of dependencies without regard to their distance in the input or output sequences ...

In this work we propose the Transformer, a model architecture eschewing recurrence and instead relying entirely on an attention mechanism to draw global dependencies between input and output."

Beyond that, also note that LLMs are probabilistic machines. Output spat out can vary and there are a handful of knobs (such as temperature) to modulate that output.

Finally, I'm pretty sure we (or the workers in the field more like it /g) don't have a firm grasp on why certain failure modes occur. Likely this is due to the fact that we (they) also don't really have a good grasp on how the damn thing actually works its 'magic'.

What is clear is that a significant subset of our semantic universe is embedded in symbols and their usage by us and this subset is somehow encoded in neural nets. This captured subset in LLMs is what drives their uncanny generative abilities. What is missing is precisely what would make it plausibly intelligent, plausibly a reasoning agent operating in a coherent semantic context.

There are some who claim our minds are just like LLMs. Some of us who pay attention to our minds sometimes catch it making nonsensical noises and correct it. (As you age you begin to notice these things..) So it is interesting to this sentient (who makes claims to being) that my mind is just like my body, it is aging, certain parts are degraded, etc., but my 'whateveritis' that is me, my self, is as timeless as ever, and seems to be a spectator of the aging mechanism ..

> The mechanism is entirely concerned with retaining semantic context and semantic "global dependencies" spanning the entire input and output.

This is not quite true: GPT, specifically, is auto-regressive. It computes things only looking back, not forward.

Given that each token has only a fixed computing budget, it is likely that GPT precomputes information which will be relevant to later tokens, to be routed via attention.

In fact, this effect was demonstrated in practice: e.g. in a prompt like "Question: Where is the Eiffel tower located? Answer: " people found that information about "Paris" is routed from tokens "Eiffel tower", i.e. this associative memory was looked up earlier than it was needed.

So I was answering from that perspective: it can do better if it knows what to pre-compute.

Like I said in the parent. I use this tool heavily every day for coding and non coding. The above was meant to be a tiny illustration. I have had long back and forth with GPT-4 trying to get it to understand some nontrivial useful code or generate some useful code and no matter how I phrased it it got it completely wrong even though it looked superficially plausible.