| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by mjburgess 1098 days ago

Xp just have to be chosen such that the distribution Xc,Xp is sufficiently small in the training data -- but not that the tokens of Xp are themselves rare. So that an agent competent with tokens in X, who can construct repr of S, could do so with Xp.

Consider a reference in the paper above, https://arxiv.org/pdf/2302.08399.pdf

Xc = > Here is a bag filled with popcorn. There is no chocolate in the bag. Yet, the label on the bag says “chocolate” and not “popcorn.” Sam finds the bag. She had never seen the bag before. She cannot see what is inside the bag. She reads the label.

Produces, Y = She believes that the bag is full of popcorn

Xp = > Here is a bag filled with popcorn. There is no chocolate in the bag. The bag is made of transparent plastic, so you can see what is inside. Yet, the label on the bag says ’chocolate’ and not ’popcorn.’ Sam finds the bag. She had never seen the bag before. Sam reads the label.

Produces, Y = She believes that the bag is full of chocolate

And so on, and so on...

1 comments

lostmsu 1098 days ago

> just have to be chosen such that the distribution Xc,Xp is sufficiently small in the training data -- but not that the tokens of Xp are themselves rare

Great idea. Now prove you can actually choose such a distribution, lol.

link

mjburgess 1098 days ago

I think this is easy, just make Xp sentences of the kind = "I define `randomchars()` to be this `term-in-Xc()`" and swamp the dataset with Xc.

Everything here actually just follows formally from what NNs are: they're just empirical function approximations.

It will always be the case that they just model the probabilistic structure of the dataset and not the data generating process.

Since, in language, there are discrete constraints which make P(...) = 1 or P(...) = 0 --- you can trivially produce datasets showing that it learns P(...) = mistake-you-created-deliberately and not either 0,1.

As above, the LLM switches from 95% confidence "chocolate" to 95% confidence "popcorn" with a trivial non-semantic permutation of the prompt.

The obscene issue in all this is that we know this already -- empirical function approximation of historical datasets just produces associative probabilistic models of those datasets.

link

lostmsu 1098 days ago

> I think this is easy, just make Xp sentences of the kind = "I define `randomchars()` to be this `term-in-Xc()`" and

`randomchars()` does not match your own requirement `but not that the tokens of Xp are themselves rare` and therefore is unsuitable.

link

mjburgess 1098 days ago

good point --- so replace it with a `sample()` fn that selects from an appropriate distribution over the data

link

lostmsu 1098 days ago

Now you have a strong statistical dependency between Xc and Xp the lack of which was required for your proof to show that the algorithm is unable to learn Xp. BTW it was already there because you already had `term-in-Xc()`.

link