| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by mjburgess 1544 days ago

I have no discomfort with the notion that our bodies, which grow in response to direct causal contact with our environment, contain in-their-structure the generative capbaility for knoweldge, imagination, skill, growth -- and so on.

I have no discomfort with the basically schiozphrenic notion that the shapes of words have something to do with the nature of the world. I just think its a kind of insantity which absolutely destroys our ability to reason carefully about the use of these systems.

That "tr" occurs before "ee" says as much about "trees" as "leaves are green" says -- it is only that *we* have the relevant semantics that the latter is meaningful when interpreted in the light of our "environmental history" recorded in our bodies, and given weight and utility by our imaginations.

The structure of text is not the structure of the world. This thesis is mad. Its a scientific thesis. It is trivial to test it. It is trivial to wholey discred it. It's pseudoscience.

No one here is a scientist and no one treats any of this as science. Where's the criteria for the emprical adequecy of NLP systems as models of language? Specifying any, conducting actual hypothesis tests, and establishing a theory of how NLP systems model language -- this would immediately reveal the smoke-and-mirros.

The work to reveal the statistical tricks underneath them takes years, and no one has much motivation to do it. The money lies in this sales pitch, and this is no science. This is no scientific method.

3 comments

whimsicalism 1544 days ago

Agree to disagree. I think you are opining about things that you are lacking fundamental knowledge on.

> The structure of text is not the structure of the world. This thesis is mad. Its a scientific thesis. It is trivial to test it. It is trivial to wholey discred it. It's pseudoscience.

It's unclear what you even mean by that. Are the electrical impulses coming to our brain the "structure of the world"?

yldedly 1543 days ago

The structure of having X apples in Y buckets is the same as the structure in the expression "X * Y", as long as the expression exists in a context that can parse it using the rules of arithmetic, such as a human, or a calculator.

These language models lack context, not just for arithmetic, but for everything. They can't parse "X * Y" for any X and Y, they've just associated the expression with the right answer for so many values of X and Y, that we get fooled into thinking they know the rules.

We get fooled into thinking they've learned the structure of the world. But they've only learned the structure of text.

whimsicalism 1543 days ago

It would be trivial for a network of this size to code general rules for multiplication.

At a certain point, when you have enough data, finding the actual rule is actually the easier solution than memorizing each data point. This is the key insight of deep learning.

yldedly 1543 days ago

Really? Better inform all the researchers working on this that they're wasting their time then: https://arxiv.org/abs/2001.05016

More fundamentally, any finite neural net is either constant or linear outside the training sample,depending on the activation function. Unless you design special neurons like in the paper above, which solves this specific problem for arithmetic, but not the general problem of extrapolation.

mjburgess 1543 days ago

> any finite neural net is either constant or linear outside the training sample

Hence why the structure of our bodies has to include the capacity for imagination. Our brain structure does not record everything that has happened. It permits is to imagine an infinite number of things which might happen.

We do not come to understand the world by having a brain-structure isomorphic to world structure -- this is none-sense for, at least, the above reason. But also, there really isnt anything like "world structure" to be isomorphic to. Ie., brains arent HDDs.

They are, at least, simulators. I dont think we'll find anything in the brain like "leaves are green" because that is just a generated public representation of a latent-simulating-thought. There isnt much to be learned about the world from these, they only make sense to us.

That all the text of human history has associations between words is the statistical coincidence that modern NLP uses for its smoke-and-mirrors. As a theory of language it's madness.

FeepingCreature 1543 days ago

Isn't that per-layer?

yldedly 1543 days ago

No, no matter how many piecewise linear functions you compose, the result is still piecewise linear.

hackinthebochs 1543 days ago

>We get fooled into thinking they've learned the structure of the world. But they've only learned the structure of text.

To what degree does the structure of text correspond to structure of the world, in the limit of a maximally descriptive text corpus? Nearly complete if not totally complete, as far as I can tell. What is left out? The subjective experience of being embodied in the world. But this subjective experience is orthogonal to the structure of the world. And so this limitation does not prevent an understanding of the structure.

yldedly 1543 days ago

The point is that not only is it impossible to infer the structure of the world from text, deep learning is incapable of learning about or even representing the world.

The reason language makes sense to us is that it triggers the right representations. It does not make sense intrinsically, it's just a sequence of symbols.

Learning about the world requires at least causal inference, modular and compact representations such as programming languages, and much smarter learning algorithms than random search or gradient descent.

hackinthebochs 1543 days ago

I don't know why you think this. There is much structural regularity in a large text corpus that is descriptive of relationships in the world. Eventually the best way to predict this regularity is just to land in a portion of parameter space that encodes the structure. But again, in the limit of a maximally descriptive text corpus, the best way to model this structure is just to encode the structure of the world. You have given no reason to think this is inherently impossible.

yldedly 1543 days ago

>There is much structural regularity in a large text corpus that is descriptive of relationships in the world.

Sure, there is a lot. But let's say we want to learn what apples are. So we look at occurrences of "apple" in the text corpus, and learn that apples can be eaten, they can be sweet, sometimes they are sour, red, sometimes green, and so on.

Can apples spontaneously change size? Hmm, no idea, no mention of that in the text. Can they be used as a knife? Dunno. If I had an apple in New York 4 minutes ago, can someone else be eating the same apple in Hong Kong now? Dunno. Did apples exist in France two million years ago? Dunno. Can you drive to Saturn in an apple? Dunno.

In short, there's no actual model of what an apple is, as an object in space, connected to other objects by various relationships. If there were, the model could figure out the answers to the questions above by inheritance.

Maybe these particular questions happen to be answered correctly by PaLM. Maybe not, but the next LLM will include this comment in the training corpus.

But the reason GPT-3 and other models tend to make no sense is because their output is not constrained by reality. The text in the training corpus tends to conform to reality, but when you prompt the model with questions that nobody would ever write text about, the illusion falls away:

Q: How many eyes does a giraffe have? A: A giraffe has two eyes.

Q: How many eyes does my foot have? A: Your foot has two eyes.

Q: How many eyes does a spider have? A: A spider has eight eyes.

Q: How many eyes does the sun have? A: The sun has one eye.

Q: How many eyes does a blade of grass have? A: A blade of grass has one eye.

Q: How do you sporgle a morgle? A: You sporgle a morgle by using a sporgle.

Q: How many bonks are in a quoit? A: There are three bonks in a quoit.

Q: How many rainbows does it take to jump from Hawaii to seventeen? A: It takes two rainbows to jump from Hawaii to seventeen.

Q: Which colorless green ideas sleep furiously? A: Ideas that are colorless, green, and sleep furiously are the ideas of a sleep furiously.

Q: Do you understand these questions? A: I understand these questions.

(from https://lacker.io/ai/2020/07/06/giving-gpt-3-a-turing-test.h...)

FeepingCreature 1543 days ago

It sounds like you're arguing that GPT doesn't work because it cannot work. However, it does work.

So how does PaLM understand causal chains and explain jokes that it has never seen before?

yldedly 1543 days ago

It doesn't. It's pattern matching, and you're seeing cherry picked examples. The pattern matching is enough to give the illusion of understanding. There's plenty of articles where more thorough testing reveals the difference. Here are two: https://medium.com/@melaniemitchell.me/can-gpt-3-make-analog...

But you could also just try one of these models, and see for yourself. It's not exactly subtle.

https://www.technologyreview.com/2020/08/22/1007539/gpt3-ope...

nl 1543 days ago

> No one here is a scientist and no one treats any of this as science. Where's the criteria for the emprical adequecy of NLP systems as models of language? Specifying any, conducting actual hypothesis tests, and establishing a theory of how NLP systems model language -- this would immediately reveal the smoke-and-mirros.

What do you mean?

I'm not a scientist but I play one sometimes, and I managed a whole team of them working in this field.

The theory of language models is well established.

> Where's the criteria for the emprical adequecy of NLP systems as models of language?

There are lots(!?) I think the Winograd schema challenge[1] is an easy one to understand, and meets a lot of your objections because it is grounded in physical reality.

Statement:

The city councilmen refused the demonstrators a permit because they [feared/advocated] violence.

Question:

Does "they" refer to the councilmen or the demonstrators?

The human baseline for this challenge is 92%[1]. PaLM (this Google language model) scored 90% (4% higher than the previous best)[3].

[1] https://en.wikipedia.org/wiki/Winograd_schema_challenge

[2] http://ceur-ws.org/Vol-1353/paper_30.pdf

[3] https://storage.googleapis.com/pathways-language-model/PaLM-... pg 12

mjburgess 1543 days ago

Indeed, all these test are not of empirical adequacy which really evidences the point. The whole field is in this insular pseudoscientific mould of "its true if it passes an automated test to x%".

A theory with empirical adequecy would require you to do some actual research into language use in humans; all of its features; how it works; various theories of its mechanisms etc. And after a comprehensive, experimental and detailed theoretical work -- show that NLP models even *any* of it.

Ie., that any NLP model is a model of language.

All you do above is design your own win condition, and say you've won. This precludes actually knowing anything about how language works, and is profoundly pseudoscientific. If you set-up tests for toys, and they pass -- good, you've made a nice toy.

You may only claim is models some target after actually doing some science.

nl 1543 days ago

A theory with empirical adequecy would require you to do some actual research into language use in humans; all of its features; how it works; various theories of its mechanisms etc. And after a comprehensive, experimental and detailed theoretical work -- show that NLP models even any* of it.*

What - specifically - do you mean?

There's an entire field adjacent to NLP called Computational Linguistics. Most people in the field work across them both, and there is significant cross pollination.

It's unclear if think there is some process in the brain that you think NLP models should be similar to. If this is the case you should look at studies similar to [1] where they do MRI imaging and can see similar responses in semantically similar words. This is very similar to how word vectors put similar concept closely together (and of course how more complex models put concept close together).

Or perhaps you think that NLP models do not understand syntactic concepts like nouns, verbs etc. This is incorrect too[2].

[1] https://www.tandfonline.com/doi/full/10.1080/23273798.2017.1...

[2] https://explosion.ai/demos/displacy

mjburgess 1543 days ago

It should do what language does...

Language is a phenomenon in, at least, one type of animal. It allows animals to coordinate with each other in a shared environment; it describes their internal and external states; etc. etc.

Language is a real phenomenon in the world that, like gravity, can be studied. It isnt abstract.

NLP models of language arent models of language. Theyre cheap imitations which succeed only to fool language users in local highly specific situations.

nl 1543 days ago

> NLP models of language arent models of language.

Do you actually know what a NLP Language Model refers to? It literally is a model of the language - it predicts the likelihood of the next word(s) given a set of prior word(s).

It seems you think people just throw some data at a neural network and then go wow. It's not like that at all - the field of NLP grew out of linguistics study and has deep roots in that field.

mjburgess 1543 days ago

That's not a model of language. Language is a communicative activity between language users, who do things with words, with each other.

What you're talking about is ignoring the entire empirical context of langauge, as a real-world phenomenon, and modelling is purely formal characteristics as recorded post-facto.

This will always just produce a system which cannot use langauge, but will only ever appear to within highly constrained -- essentially illusory -- contexts. Its the difference between a system which makes a film by "predicting the next frame", and a making a film by recording actual events that you are directing.

A prediction of a "next frame" is always therefore just going to be a symptom of the frames before it. When I point a camera at something new, eg., an automobile in c. 1900 -- i will record a film that has never been recorded before.

And likewise, with words: we are always in genuinely unquie unprecedented situations. And what we *do with words*, is speak about those situations *to others* who are in them with us... we aim to coordinate, move, and so on with words.

To model *language* isnt to model words, nor text, nor to predict words or text. It is to be a speaker here in the world with us, using language to do *what language does*.

No model of the regularities of text will ever produce a language-user. Language isnt a regularity, like the frames of a film -- its a suit of capacities which are responsive to the world, and enable language users to navigate it.

rafaelero 1544 days ago

Ok, boomer.