| >It is absolutely trivial to show Hyp2 is false No it's not > Current LLMs can produce impressive results on a set of linguistic inputs and then fail completely on others that make trivial alterations to the same underlying domain. >Indeed: because there're no relevant prior cases to sample from in that case. That's not what that tells us. Humans have weird failure modes that look absurd outside the context of evolutionary biology (some still look absurd) and that don't speak to any lack or presence of intelligence or complex thought. Not sure why it's so hard to grasp that LLMs are bound to have odd failure modes regardless of the above. and trivial here is relative. In my experience, "trivial" often turns out to be trivial in the way a person may not pay close attention to and be similarly tricked. For instance, GPT-4 might solve a classic puzzle correctly then fail the same puzzle subtlety changed.
I've found more often than not, simply changing names of variables in the puzzle to something completely different can get it to solve the changed puzzle. It takes memory shortcuts but can be pulled out of that.
LLMs have failure modes that look like human failure modes too. |
Eg., do you have capacity to reason about physics? Well if you're extremely drunk, less so. But not if I permute the name of the object.
> I've found more often than not, simply changing names of variables
Yes, lol --- why do you think that is?
Because in the digitised dataset of "everything ever written" those names correspond to places in that dataset that can be sampled from by the LLM. Showing Hyp1 to be the case.
P(Hyp1| ChangeNameMakesDifference) >>>>>> P(Hyp2|ChangeNameMakesDifference)
To such a degree that the latter is vanishingly close to zero.