Hacker News new | ask | show | jobs
by theincredulousk 1544 days ago
Haven't looked further, but I'm wondering about that. Is that the result of training to be able to explain that specific joke, or is it generalized?

In the past these things have been misleading. Some impressive capability ends up being far more narrow than implied, so it's kind of like just storing information and retrieving it with extra steps.

1 comments

From the example, it seems hard to imagine that it has been trained to explain this specific joke.

I understand language model skepticism is very big on HN, but this is impressive.

How much of human written history can be compressed and aproximately stored in 504Bn parameters?

It seems to me bascially certain that no compressed representation of text can be an understanding of langugae, so necessarily, any statistical algorithm here is always using coincidental tricks. That it takes 500bn parameters to do it, i think, is a clue that we dont even really need.

Words mean what we do with them -- you need to be here in the world with us, to understand what we mean. There is nothing in the patterns of our usage of words which provides their semantics, so the whole field of distributional analysis precludes this superstition.

You cannot, by mere statistical analysis of patterns in mere text, understand the nature of the world. But it is precisely this we communicate in text. We succeed because we are both in the world, not because "w" occuring before "d" somehow communicates anything.

Apparent correlations in text are meaningful to us, because we created them, and we have their semantics. The system must by is nature be a mere remebering.

> It seems to me bascially certain that no compressed representation of text can be an understanding of langugae, so necessarily, any statistical algorithm here is always using coincidental tricks. That it takes 500bn parameters to do it, i think, is a clue that we dont even really need.

I think your premise contains your conclusion, which while common, is something you should strive to avoid.

I do think your opinion is a good example of the prevailing sentiment on Hacker News. To me, it seems to come from a discomfort with the fact that even "we" emerge out of the basic interactions of basic building blocks. Our brain has been able to build world knowledge "merely by" analysis of electrical impulses being transmitted to it on wires.

I have no discomfort with the notion that our bodies, which grow in response to direct causal contact with our environment, contain in-their-structure the generative capbaility for knoweldge, imagination, skill, growth -- and so on.

I have no discomfort with the basically schiozphrenic notion that the shapes of words have something to do with the nature of the world. I just think its a kind of insantity which absolutely destroys our ability to reason carefully about the use of these systems.

That "tr" occurs before "ee" says as much about "trees" as "leaves are green" says -- it is only that *we* have the relevant semantics that the latter is meaningful when interpreted in the light of our "environmental history" recorded in our bodies, and given weight and utility by our imaginations.

The structure of text is not the structure of the world. This thesis is mad. Its a scientific thesis. It is trivial to test it. It is trivial to wholey discred it. It's pseudoscience.

No one here is a scientist and no one treats any of this as science. Where's the criteria for the emprical adequecy of NLP systems as models of language? Specifying any, conducting actual hypothesis tests, and establishing a theory of how NLP systems model language -- this would immediately reveal the smoke-and-mirros.

The work to reveal the statistical tricks underneath them takes years, and no one has much motivation to do it. The money lies in this sales pitch, and this is no science. This is no scientific method.

Agree to disagree. I think you are opining about things that you are lacking fundamental knowledge on.

> The structure of text is not the structure of the world. This thesis is mad. Its a scientific thesis. It is trivial to test it. It is trivial to wholey discred it. It's pseudoscience.

It's unclear what you even mean by that. Are the electrical impulses coming to our brain the "structure of the world"?

The structure of having X apples in Y buckets is the same as the structure in the expression "X * Y", as long as the expression exists in a context that can parse it using the rules of arithmetic, such as a human, or a calculator.

These language models lack context, not just for arithmetic, but for everything. They can't parse "X * Y" for any X and Y, they've just associated the expression with the right answer for so many values of X and Y, that we get fooled into thinking they know the rules.

We get fooled into thinking they've learned the structure of the world. But they've only learned the structure of text.

> No one here is a scientist and no one treats any of this as science. Where's the criteria for the emprical adequecy of NLP systems as models of language? Specifying any, conducting actual hypothesis tests, and establishing a theory of how NLP systems model language -- this would immediately reveal the smoke-and-mirros.

What do you mean?

I'm not a scientist but I play one sometimes, and I managed a whole team of them working in this field.

The theory of language models is well established.

> Where's the criteria for the emprical adequecy of NLP systems as models of language?

There are lots(!?) I think the Winograd schema challenge[1] is an easy one to understand, and meets a lot of your objections because it is grounded in physical reality.

Statement:

The city councilmen refused the demonstrators a permit because they [feared/advocated] violence.

Question:

Does "they" refer to the councilmen or the demonstrators?

The human baseline for this challenge is 92%[1]. PaLM (this Google language model) scored 90% (4% higher than the previous best)[3].

[1] https://en.wikipedia.org/wiki/Winograd_schema_challenge

[2] http://ceur-ws.org/Vol-1353/paper_30.pdf

[3] https://storage.googleapis.com/pathways-language-model/PaLM-... pg 12

Indeed, all these test are not of empirical adequacy which really evidences the point. The whole field is in this insular pseudoscientific mould of "its true if it passes an automated test to x%".

A theory with empirical adequecy would require you to do some actual research into language use in humans; all of its features; how it works; various theories of its mechanisms etc. And after a comprehensive, experimental and detailed theoretical work -- show that NLP models even *any* of it.

Ie., that any NLP model is a model of language.

All you do above is design your own win condition, and say you've won. This precludes actually knowing anything about how language works, and is profoundly pseudoscientific. If you set-up tests for toys, and they pass -- good, you've made a nice toy.

You may only claim is models some target after actually doing some science.

Ok, boomer.
>Words mean what we do with them -- you need to be here in the world with us, to understand what we mean

This is like saying "humans can't fly because flight requires flapping wings under your own power". Sure, its true given the definition this statement is employing, but so what? Nothing of substance is learned by definition. We certainly are not learning about any fundamental limitations of humans from such a definition. Similarly, defining understanding language as "the association of symbols with things/behaviors in the world" demonstrates nothing of substance about the limits of language models.

But beyond that, its clear to me the definition itself is highly questionable. There are many fields where the vast majority of uses of language do not directly correspond with things or behaviors in the world. Pure math is an obvious example. The understanding of pure math is a purely abstract enterprise, one constituted by relationships between other abstractions, bottoming out at arbitrary placeholders (e.g. the number one is an arbitrary placeholder situated in a larger arithmetical structure). By your definition, a language model without any contact with the world can understand purely abstract systems as well as any human. But this just implies there's something to understanding beyond merely associations of symbols with things/behaviors in the physical world.