| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by mjburgess 961 days ago

> it is now quite well established that GPT-4 has impressive out-of-sample performance

Err... I can show this is false, kinda trivially. People who engage in prompt-confirmation-bias aren't aware of what the in-sample is.

It's basically everything ever digitised: you can ask it for the first paragraph of every dickens novel, to what the average petal length of an iris flower is -- etc.

How are you measuring the in-sample here?

If you engage in straightfoward reasoning from first principles, and are basically aware of what the training data is, you can show in 10 seconds critical failures of generalisation.

If you want a recipe: go find some fringe api docs. Establish that it has been trained on them. Then, since they're fringe there wont be much code on github, etc. Now ask it do something non-trivial with that API. It will fail, and the mechanism will be obvious: it'll jam in correlated code that lacks relevance.

Do the same on a popular API, and see it succeed.

The in-sample will be obvious for both, and the bounday of generalisation

1 comments

kristiandupont 961 days ago

You can make it invent a new language: https://maximumeffort.substack.com/p/i-taught-chatgpt-to-inv...

I am sure you will continue to argue that this is still in line with everything-thats-ever-written prediction but my opinion is that at that point, it's a meaningless distinction. The human brain is also just a machine.

link

mjburgess 961 days ago

So I was with a financial researcher recently, and he wanted to use ChatGPT to summarise some reference financial data -- and it did so, actually correctly.

Being sceptical, as every person ought in these matters, I changed the finical data and performed the same analysis (both in a new tab, and within the same convo). The results were the same!

How strange?

Well, in being reference financial data ChatGPT was reporting prior reference summaries of it. When that data was changed it was reporting the very same reference summaries (which were now wrong).

Since it's incapable of actually summarising financial data. It's only capable of selecting combinations of pieces of its training set.

Now, is this distinction "meaningless" ?

No, it's the difference between this guy being fired for causing a massive loss on a major project; and this guy keeping his job and doing it well.

link

famouswaffles 961 days ago

>Since it's incapable of actually summarising financial data. It's only capable of selecting combinations of pieces of its training set.

Third completely off misconception from you today.

This is not at all what it is doing. "Supercharged Interpolation" is false and makes no sense. It's not a lookup table either. It doesn't memorize enough of what it needs to to make your assertion possible.

https://arxiv.org/abs/2110.09485

link

mjburgess 961 days ago

at 500gb, you can store nearly everything ever written -- let alone compressed.

all statistical learning is a variation on k-nn (see the relevant paper on this) but likewise this is obvious a priori

k-nn is the ideal learner, and a good starting point for analysis

the question for any given system is: what is the learning space, what is the distance function, and how many points are being considered

NNs set up a compressed X,y space, in that space choose points via an empirical expectation, and obtain a weighted average as their prediction

That's just what they do -- there isn't any other mechanism here. The whole formal structure of the NN can be written down on a page of paper

your paper above doesn't deal with this -- it's a reply to the 'forced interpolation' view, which i haven't espoused. but often NNs are forced interpolated

'extrapolation' is of course a part of the possible predictive output of a statical learning system -- in that it's latent space is taken to be embedded in R^n and so one can 'veer off' into R.

Whenever you attribute a higher fidelity space to a small latent space you are, in effect, extrapolating

link

famouswaffles 961 days ago

>at 500gb, you can store nearly everything ever written -- let alone compressed.

No you cannot.

>That's just what they do -- there isn't any other mechanism here.

That's not what they do. They are many papers now showing ICL demonstrating some kind of optimization method during inference which would not be happening if all they did was retrieval.

I'm come to realize you don't know what you're talking about. Your level of denial is scary to see.

link

mjburgess 961 days ago

just do the calculation yourself: how many books is 500gb at, say, a few bits per character?

more than all every written -- and so on

perhaps apply a single drop of scepticism to this credulity

even, just ask chatgpt to repeat the first paragraph of some book -- say, a dickens novel

link

kristiandupont 961 days ago

>Since it's incapable of actually summarising financial data

It's not, though. It is in fact able to summarize financial data, just as it's able to write code and diagnose a medical condition. It makes mistakes, yes, even grave ones, much more so than experts in those fields would.

link

mjburgess 961 days ago

It isnt making mistakes ... its never actually doing it.

Do you see a difference between the process of adding numbers and dividing by their count (taking a mean) and emitting numeric tokens which are most probable for a given input?

The former is called "taking a mean" the latter isnt. This system never engages in any method to summarise financial data. It's method is always the same: to emit tokens most probable given a set of historical tokens.

It's the difference between saying "the average of 1,2,3" is 2 because that sentence occurs 1,000,000 times and saying it's 2 because you've literally computed it.

This system does not run financial summary algorithms. It's a trick

link

sweetgiorni 960 days ago

To add to your point: try asking ChatGPT to do basic arithmetic on numbers it hasn't seen before. You'll see just how good it is at computation.

link

famouswaffles 960 days ago

It's better (GPT-4) than you could manage without an external tool or pad. and that's after being severely hampered by tokenization. https://arxiv.org/abs/2310.02989

link

intended 961 days ago

The brain is a machine, the issue is the difference between 2 claims

LLMs are enough to be a brain

LLMs are not enough to be a brain.

link

Txmm 959 days ago

But “everything ever digitised” includes a tonne of linguistics information - it’s still in sample.

link