| > One reason why just blathering on endlessly... First of all, I would urge you to stop arbitrarily using negative words to make an argument. Saying that LLMs are "blathering" is equivalent to saying you and I are "smacking meat onto plastic to communicate" - it's completely empty of any meaning. This "vibes based arguing" is common in these discussions and a massive waste of time. Now, I don't really understand what you mean by "almost impossible to maintain long-term context/attention". I'm writing fiction in my spare time, LLMs do very well on this by my testing, even subtle and complex simulations of environments, including keeping track of multiple "off-screen" dynamics like a pot boiling over. There is nothing "1-dimensional" about the context, unless you mean that it is directional in time, which any human thought is as well, of course. As I said in my original reply, each token is represented by a multidimensional embedding, and even that is abstracted away by the time inference reaches the later layers. The word "citrus" isn't just a word for the LLM, just as it isn't just a word for you. Its internal representation retrieves all the contextual understanding that is related to it. Properties, associated feelings, usage - every relevant abstract concept is considered. And these concepts interact which every embedding of every other token in the input in a learned way, and with the position they have relative to each other. And then when an output is generated from that dynamic, said output influences the dynamic in a way that is just as multidimensional. The model can maintain context as rich as it wants, and it can built upon that context in whatever way it wants as well. The problem is that in some domains, it didn't get enough training time to build robust transformation rules, leading it to draw false conclusions. You should reflect on why you are only able to provide vague and under defined, often incorrect, arguments here. You're drawing distinctions that don't really exist and trying to hide that by appealing to false intuitions. > The reasoning weakness... it's a fundamental architecturally-based limitation... You have provided no evidence or reasoning for that conclusion. The river crossing puzzle is exactly what I had in mind when talking about specific domains. It is a common trick question with little to no variation and LLMs have overfit on that specific form of the problem. Translate it to any other version - say transferring potatoes from one pot to the next, or even a mathematical description of sets being modified - and the models do just fine. This is like tricking a human with the "As I was going to Saint Ives" question, exploiting their expectation of having to do arithmetic because it looks superficially like a math problem, and then concluding that they are fundamentally unable to reason. > People like Demis Hassabis (CEO of DeepMind) acknowledge the weakness too. What weakness? That current LLMs aren't as good as humans when reasoning over certain domains? I don't follow him personally but I doubt he would have the confidence to make any claims about fundamental inabilities of the transformer architecture. And even if he did, I could name you a couple of CEOs of AI labs with better models that would disagree, or even Turing award laureates. This is by no means a consensus stance in the expert community. |
I disagree - there is pretty widespread agreement that reasoning is a weakness, even among the best models, (and note Chollet's $1M ARC prize competition to spur improvements), but the big labs all seem to think that post-training can fix it. To me this is whack-a-mole wishful thinking (reminds me of CYC - just add more rules!). At least one of your "Turing award laureates" thinks Transformers are a complete dead end as far as AGI goes.
We'll see soon enough who's right.