| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by mjburgess 1543 days ago

All GD learners are interpolators (cf. https://arxiv.org/abs/2012.00152) we also know theyre exponential in parameter count ( cf. https://www.researchgate.net/figure/Number-of-parameters-ie-... )

> If GPT is not exponential, then the m/p distinction falls apart.

Yes, I think if you have a system which implements QAWH with a similar compelxity to a known intelligent system -- at that point I have no empirical issues. I think, at that point, you have a workiung system.

We then ask if it is thinking about anything, and I think that'd be an open question as to how its implemented. I dont think the pattern alone would mean the system had intentionality -- but my issue at this stage is the narrower empirical one. Without something like a "tractable complexity class", your system is broken.

> And GPT has way too much world-knowledge, IMO, to be storing things in such a costly fashion.

This is an illusion. Knowledge here is deterministic, to the same question, the same answer. GPT generates answers across runs which are self-contradictory, etc. "the same question" (even literally, or if you'd like, with some rephrasing) is given quite radically different answers.

I think all we have here is evidence of the (already known) tremendous compressibility of text data. We can, in c. 500bn numbers, compress most of the histoy of anything ever said. With such a databank, a machine can appear to do quite a lot.

This isnt world knowledge... it is a symptom of how we, language users, position related words near each other for the sake of easy comprehension. By doing this one can compress our text into brute statstical associations which appear to be meaningful.

As much as Github's AI is basically just copy/pasting code from github repos, GPT is just copy/pasting sentences from books.

All the code in github, compressed into billions of numbers, and decompressed a little -- that's a "statical space of tricks and coincidences" so large we cannot by intution alone fathom it. It's what makes these systems useful, but also easy illusions.

We can, by a scientific investigation of these systems as objects of study, come up with trivial hypothesis that expose their fundamentally dumb coincidental character. There are quite a few papers now which do this, I dont have one to hand.

But you know, investigate a model of this kind yourself: permute the input questions, investigate the answers.. and invalidate your hypothesis (like a scientist might do)... can you invalidate your hypothesis?

I think with only a little thoguh you will find it fairly trivial to do so.

1 comments

FeepingCreature 1542 days ago

> All GD learners are interpolators (cf. https://arxiv.org/abs/2012.00152) ,

If the paper is substantially correct I concede the point. But what I've read of reactions leads me to believe the conclusion is overstated.

Regarding compression vs intelligence, I already believe that intelligence, even human intelligence, is largely a matter of compressing data.

Regarding "knowledge is deterministic", ignoring the fact that it's not even deterministic in humans, so long as GPT can instantiate agents I consider the question of whether it "is" an agent academic. If GPT can operate over W_m and H_n, and I live in W_1 and have H_5, I just need to prompt it with evidence for the world and hidden state. Consider for example, how GAN image generators have a notion of image quality but no inherent desire to "draw good images", so to get quality out you have to give them circumstantial evidence that the artist they are emulating is good, ie. "- Unreal Engine ArtStation Wallpaper HQ 4K."

link

FeepingCreature 1542 days ago

Also, of course, it's hard to see how DALL-E can create "a chair in the shape of an avocado" by interpolating between training samples, none of which were a chair in the shape of an avocado nor anywhere close. The orthodox view of interpolating between a deep hierarchy of extracted features and meta-features readily explains this feat.

link