Hacker News new | ask | show | jobs
by whimsicalism 1543 days ago
> It seems to me bascially certain that no compressed representation of text can be an understanding of langugae, so necessarily, any statistical algorithm here is always using coincidental tricks. That it takes 500bn parameters to do it, i think, is a clue that we dont even really need.

I think your premise contains your conclusion, which while common, is something you should strive to avoid.

I do think your opinion is a good example of the prevailing sentiment on Hacker News. To me, it seems to come from a discomfort with the fact that even "we" emerge out of the basic interactions of basic building blocks. Our brain has been able to build world knowledge "merely by" analysis of electrical impulses being transmitted to it on wires.

1 comments

I have no discomfort with the notion that our bodies, which grow in response to direct causal contact with our environment, contain in-their-structure the generative capbaility for knoweldge, imagination, skill, growth -- and so on.

I have no discomfort with the basically schiozphrenic notion that the shapes of words have something to do with the nature of the world. I just think its a kind of insantity which absolutely destroys our ability to reason carefully about the use of these systems.

That "tr" occurs before "ee" says as much about "trees" as "leaves are green" says -- it is only that *we* have the relevant semantics that the latter is meaningful when interpreted in the light of our "environmental history" recorded in our bodies, and given weight and utility by our imaginations.

The structure of text is not the structure of the world. This thesis is mad. Its a scientific thesis. It is trivial to test it. It is trivial to wholey discred it. It's pseudoscience.

No one here is a scientist and no one treats any of this as science. Where's the criteria for the emprical adequecy of NLP systems as models of language? Specifying any, conducting actual hypothesis tests, and establishing a theory of how NLP systems model language -- this would immediately reveal the smoke-and-mirros.

The work to reveal the statistical tricks underneath them takes years, and no one has much motivation to do it. The money lies in this sales pitch, and this is no science. This is no scientific method.

Agree to disagree. I think you are opining about things that you are lacking fundamental knowledge on.

> The structure of text is not the structure of the world. This thesis is mad. Its a scientific thesis. It is trivial to test it. It is trivial to wholey discred it. It's pseudoscience.

It's unclear what you even mean by that. Are the electrical impulses coming to our brain the "structure of the world"?

The structure of having X apples in Y buckets is the same as the structure in the expression "X * Y", as long as the expression exists in a context that can parse it using the rules of arithmetic, such as a human, or a calculator.

These language models lack context, not just for arithmetic, but for everything. They can't parse "X * Y" for any X and Y, they've just associated the expression with the right answer for so many values of X and Y, that we get fooled into thinking they know the rules.

We get fooled into thinking they've learned the structure of the world. But they've only learned the structure of text.

It would be trivial for a network of this size to code general rules for multiplication.

At a certain point, when you have enough data, finding the actual rule is actually the easier solution than memorizing each data point. This is the key insight of deep learning.

Really? Better inform all the researchers working on this that they're wasting their time then: https://arxiv.org/abs/2001.05016

More fundamentally, any finite neural net is either constant or linear outside the training sample,depending on the activation function. Unless you design special neurons like in the paper above, which solves this specific problem for arithmetic, but not the general problem of extrapolation.

> any finite neural net is either constant or linear outside the training sample

Hence why the structure of our bodies has to include the capacity for imagination. Our brain structure does not record everything that has happened. It permits is to imagine an infinite number of things which might happen.

We do not come to understand the world by having a brain-structure isomorphic to world structure -- this is none-sense for, at least, the above reason. But also, there really isnt anything like "world structure" to be isomorphic to. Ie., brains arent HDDs.

They are, at least, simulators. I dont think we'll find anything in the brain like "leaves are green" because that is just a generated public representation of a latent-simulating-thought. There isnt much to be learned about the world from these, they only make sense to us.

That all the text of human history has associations between words is the statistical coincidence that modern NLP uses for its smoke-and-mirrors. As a theory of language it's madness.

Isn't that per-layer?
>We get fooled into thinking they've learned the structure of the world. But they've only learned the structure of text.

To what degree does the structure of text correspond to structure of the world, in the limit of a maximally descriptive text corpus? Nearly complete if not totally complete, as far as I can tell. What is left out? The subjective experience of being embodied in the world. But this subjective experience is orthogonal to the structure of the world. And so this limitation does not prevent an understanding of the structure.

The point is that not only is it impossible to infer the structure of the world from text, deep learning is incapable of learning about or even representing the world.

The reason language makes sense to us is that it triggers the right representations. It does not make sense intrinsically, it's just a sequence of symbols.

Learning about the world requires at least causal inference, modular and compact representations such as programming languages, and much smarter learning algorithms than random search or gradient descent.

I don't know why you think this. There is much structural regularity in a large text corpus that is descriptive of relationships in the world. Eventually the best way to predict this regularity is just to land in a portion of parameter space that encodes the structure. But again, in the limit of a maximally descriptive text corpus, the best way to model this structure is just to encode the structure of the world. You have given no reason to think this is inherently impossible.
It sounds like you're arguing that GPT doesn't work because it cannot work. However, it does work.

So how does PaLM understand causal chains and explain jokes that it has never seen before?

> No one here is a scientist and no one treats any of this as science. Where's the criteria for the emprical adequecy of NLP systems as models of language? Specifying any, conducting actual hypothesis tests, and establishing a theory of how NLP systems model language -- this would immediately reveal the smoke-and-mirros.

What do you mean?

I'm not a scientist but I play one sometimes, and I managed a whole team of them working in this field.

The theory of language models is well established.

> Where's the criteria for the emprical adequecy of NLP systems as models of language?

There are lots(!?) I think the Winograd schema challenge[1] is an easy one to understand, and meets a lot of your objections because it is grounded in physical reality.

Statement:

The city councilmen refused the demonstrators a permit because they [feared/advocated] violence.

Question:

Does "they" refer to the councilmen or the demonstrators?

The human baseline for this challenge is 92%[1]. PaLM (this Google language model) scored 90% (4% higher than the previous best)[3].

[1] https://en.wikipedia.org/wiki/Winograd_schema_challenge

[2] http://ceur-ws.org/Vol-1353/paper_30.pdf

[3] https://storage.googleapis.com/pathways-language-model/PaLM-... pg 12

Indeed, all these test are not of empirical adequacy which really evidences the point. The whole field is in this insular pseudoscientific mould of "its true if it passes an automated test to x%".

A theory with empirical adequecy would require you to do some actual research into language use in humans; all of its features; how it works; various theories of its mechanisms etc. And after a comprehensive, experimental and detailed theoretical work -- show that NLP models even *any* of it.

Ie., that any NLP model is a model of language.

All you do above is design your own win condition, and say you've won. This precludes actually knowing anything about how language works, and is profoundly pseudoscientific. If you set-up tests for toys, and they pass -- good, you've made a nice toy.

You may only claim is models some target after actually doing some science.

A theory with empirical adequecy would require you to do some actual research into language use in humans; all of its features; how it works; various theories of its mechanisms etc. And after a comprehensive, experimental and detailed theoretical work -- show that NLP models even any* of it.*

What - specifically - do you mean?

There's an entire field adjacent to NLP called Computational Linguistics. Most people in the field work across them both, and there is significant cross pollination.

It's unclear if think there is some process in the brain that you think NLP models should be similar to. If this is the case you should look at studies similar to [1] where they do MRI imaging and can see similar responses in semantically similar words. This is very similar to how word vectors put similar concept closely together (and of course how more complex models put concept close together).

Or perhaps you think that NLP models do not understand syntactic concepts like nouns, verbs etc. This is incorrect too[2].

[1] https://www.tandfonline.com/doi/full/10.1080/23273798.2017.1...

[2] https://explosion.ai/demos/displacy

It should do what language does...

Language is a phenomenon in, at least, one type of animal. It allows animals to coordinate with each other in a shared environment; it describes their internal and external states; etc. etc.

Language is a real phenomenon in the world that, like gravity, can be studied. It isnt abstract.

NLP models of language arent models of language. Theyre cheap imitations which succeed only to fool language users in local highly specific situations.

> NLP models of language arent models of language.

Do you actually know what a NLP Language Model refers to? It literally is a model of the language - it predicts the likelihood of the next word(s) given a set of prior word(s).

It seems you think people just throw some data at a neural network and then go wow. It's not like that at all - the field of NLP grew out of linguistics study and has deep roots in that field.

Ok, boomer.