Hacker News new | ask | show | jobs
by cypress66 1095 days ago
> 8. Train an LLM on Xc and it's accuracy on judging Y entailed by Xp is random.

This is clearly where the "proof" falls apart. Even in tasks where GPT4 struggles, it's accuracy will still be better than random. The bar of "better than random" is so low that even weak LLMs will be able to surpass it.

More so, you need to prove not just a single, but that no task/domain exists for which LLMs satisfy 8.

What your proof says is basically "LLMs do not generalize even the slightest for any task". And that's trivial to disprove.

1 comments

I just need to be able to create a split in Xc,Xp so that Xp is random. I think that's really quite easy.

If you could put ChatGPT in a loop, take some Xc prompts and permute with some non-semantic phrases ("Alice believes that... Xc ... what did Alice believe?") etc --- until you find those cases.

I imagine we will discover quite a large number of such non-semantic phrases which have this effect. Because the tokens in those phrases will, joint with Xc, be arbitrarily distributed in some historical data (distributed to our preference when finding them).

This seems just kinda basically obvious, right? Entailments are discretely constrained by semantics, and historical datasets can contain arbitrary mixtures of random distributions of syntax.

NNs only model those distributions -- and not the entailments -- which, at the very least, are extremely discrete.