| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by versteegen 1227 days ago
	Why think only about "incremental" improvement? People aren't just making slight tweaks, new papers are published at a remarkable rate where people try significantly different architectures, training methods, etc, and that steady progress leads to ever more impressive results. How can you assume this direction of research will lead nowhere? OK, ignore everyone who doesn't understand the technology. Of those of who do, I'm utterly amazed how pessimistic many are that this "isn't capable" of leading to AGI. Probably not Transformers specially, but LLMs show that intelligence is remarkably easy. You don't even need to put anything in the neural architecture designed to perform reasoning tasks, but they can be learnt regardless, because Transformers are flexible enough to learn to emulate computation (Turing machines) with bounded space and time, going beyond the famous result that 2-layer MLPs are universal function approximators.

3 comments

LegionMammal978 1227 days ago

> Probably not Transformers specially, but LLMs show that intelligence is remarkably easy.

LLMs show that language is remarkably easy. Ever since GPT-3 was released, I've been convinced that language comprehension isn't nearly as big a component of general intelligence as people are making it out to be. This makes some intuitive sense: I recall a writer for a tabloid expressing that they simply turn off their brain and start spinning up paragraphs.

But so far, I haven't seen any of these models perform logical reasoning, beyond basic memorization and reasoning by analogy. They can tell you all day what their "reasoning process" is, but the actual content of any step is simply something that looks like it would fit in that step. Where do you derive this confidence that advanced logical reasoning is a natural capability of transformer models? (Being capable of emulating finite Turing machines is hardly impressive: any sufficiently large finite circuit can do that.)

link

hackinthebochs 1227 days ago

>Ever since GPT-3 was released, I've been convinced that language comprehension isn't nearly as big a component of general intelligence as people are making it out to be

"X is the key to intelligence"

computers do X

"Well actually, X isn't that hard..."

rinse and repeat 100x

At some point you have to stop and reflect on whether your concept of intelligence is faulty. All the milestones that came and went (arithmetic, simulations, chess, image recognition, language, etc) are all facets of intelligence. It's not that we're discovering intelligence isn't this or that computational feat, but that intelligence is just made up of many computational feats. Eventually we will have them all covered, much sooner than the naysayers think.

link

LegionMammal978 1227 days ago

> All the milestones that came and went (arithmetic, simulations, chess, image recognition, language, etc) are all facets of intelligence.

Why should I have to care about those weird milestones that some other randos came up with once upon a time? I've never espoused any of those myself, so how is this supposed to prove anything about my thought process?

> It's not that we're discovering intelligence isn't this or that computational feat, but that intelligence is just made up of many computational feats. Eventually we will have them all covered, much sooner than the naysayers think.

Well, it certainly appears to me like there's a big qualitative difference between the capabilities you mentioned (arithmetic and simulations are just applications of predefined algorithms; chess, image recognition, and language are memorization, association, and analogy on a massive scale) and the kind of ad-hoc multi-step logical reasoning that I'd expect from any AGI. You can argue that the difference is purely illusory, but I'll have a very hard time believing that until I see it with my own eyes.

link

hackinthebochs 1226 days ago

>so how is this supposed to prove anything about my thought process?

Because its the same thought process that animated theorists of the past. Unless you have some novel argument to demonstrate why language isn't a feature of intelligence despite wide acceptance pre-LLMs, the claim can be dismissed as an instance of this pernicious pattern. Just because computers can do it and it isn't incomprehensibly complex, doesn't mean it's not a feature of intelligence.

>Well, it certainly appears to me like there's a big qualitative difference between the capabilities you mentioned... and the kind of ad-hoc multi-step logical reasoning that I'd expect from any AGI.

I don't know what "qualitative" means here, but I agree there is a difference in kind of computation. But I expect multistep reasoning to just be variations of the kinds of computations we already know how to do. Multistep reasoning is a kind of search problem over semantic space. LLM's handle mapping the semantic space, and our knowledge from solving games can inform a kind of heuristic search. Multistep reasoning will fall to a meta-computational search through semantic space. ChatGPT can already do passable multistep reasoning when guided by the user. An architecture with a meta-computational control mechanism can learn to do this through self-supervision. The current limitations of LLMs are not due to fundamental limits of Transformers, but rather are architectural, as in the kinds of information flow paths that are allowed. In fact, I will be so bold as to say that such a meta-computational architecture will be conscious.

link

terminal_d 1226 days ago

I think that's more representative of tabloid writers than anything, haha. Understanding text is difficult, and scales with g. GPT-3 can make us believe that it can comprehend text that falls in the median of internet content, and I guess there would have to be some edge cases addressed by the devs, but it can't convince humans that is understands more difficult content, or even content that isn't in its db.

link

versteegen 1225 days ago

I totally agree with your comments on language. I was stretching it to cover "intelligence" too, what I should have said is "many components of intelligence". It really isn't one thing. But I think analogical reasoning is one of the most important, maybe the most important component! I'm not alone. [1]

> Where do you derive this confidence that advanced logical reasoning is a natural capability of transformer models?

("Advanced logical reasoning" is asking a lot, more than I wanted to claim.) I was going off papers like [2] which showed very high accuracy for multi-hop reasoning by fine tuning RoBERTa-large on a synthetic dataset, including for more hops than seen in training (although experiments "suggests that our results are not specific to RoBERTa or transformers, although transformers learn the tasks more easily"). While [3] found "that current transformers, given sufficient training data, are surprisingly robust at solving the resulting NLSat problems of substantially increased difficulty" but "transformer models’ limited scale-invariance suggests they are far from learning robust deductive reasoning algorithms". I think that low scalability is to be expected, transformers don't have a working memory on which they can iterate learnt algorithmic steps, only a fixed number of steps can be learnt (as I was saying).

Unfortunately, looking for other papers, I found [4] which pours a lot of cold water on [2], saying "a deeper analysis reveals that they appear to overfit to superficial patterns in the data rather than acquiring the logical principles governing the reasoning in these fragments". I suppose you were more correct. I still think there's more than just memorisation happening here, and it isn't necessarily dissimilar to intuitive (rapid) 'reasoning' in humans, but as with everything in LLMs, everything is muddied because capability seems to be a continuum.

[1] Hofstadter, 2001, Analogy as the core of cognition, http://worrydream.com/refs/Hofstadter%20-%20Analogy%20as%20t...

[2] AI2, 2020, RuleTaker: Transformers as Soft Reasoners over Language, https://allenai.org/data/ruletaker

[3] Richardson &al. 2021, Pushing the Limits of Rule Reasoning in Transformers through Natural Language Satisfiability https://arxiv.org/abs/2112.09054

[4] Schlegel &al. 2022, Can Transformers Reason in Fragments of Natural Language? https://arxiv.org/abs/2211.05417

link

lolinder 1226 days ago

I never said incremental improvements to LLMs won't lead anywhere, I said they won't replace me. A sibling has already commented on why that would be, and I agree with them.

I just wanted to chime in and remind about the other part of my argument: my job is not threatened until we have AGI, and AGI would be so earth-shattering to the entire premise of our economy that there's literally no point in worrying about it as an individual. We can and should talk about society-level changes like UBI, but having individual anxiety about your own personal job is a strange response to the end of the entire global economic system.

link

greenyoda 1227 days ago

> You don't even need to put anything in the neural architecture designed to perform reasoning tasks, but they can be learnt...

That sounds interesting. Can you provide a reference to this research?

link

versteegen 1225 days ago

See my reply to sibling: https://news.ycombinator.com/item?id=34672865

A more interesting example of transformers learning a process may be [1].

There's a large literature on applying language models to reasoning tasks, but not many on what's actually going on inside them. But see for example [2]. Also https://transformer-circuits.pub/ has a body of work on it, but still at a very early stage (see in particular "In-context Learning and Induction Heads").

[1] Extraction of organic chemistry grammar from unsupervised learning of chemical reactions https://www.science.org/doi/10.1126/sciadv.abe4166

[2] Analyzing the Structure of Attention in a Transformer Language Model https://arxiv.org/abs/1906.04284

link