| Well this is a good reply, but it's mistaken. The conditional probability: P(x[0]| x[-1], x[-2], x[-3] ...)
is not the same as, P(x[0] | x[-1], x[-2], ... -> x[0])
Where `->` says we select only those cases where x[-1],... brought-about x[0].To see why this is the case, suppose we do have a god's eye-view of all of spacetime. P(A|B)
always selects for all instances where B follows A.
P(A| B -> A)
selects only those instances where B's following A was caused by A.
Eg., P(ShoesWet | Raining)
is very different from
P(ShoesWet | Raining -> ShoesWet)
in the former case the two events have, in general, nothing to do with each other.To select "Raining -> ShoesWet" even with a gods-eye-view we need more than statistics... since those events which count as "Rain -> ShoeWet" have to be selected on a non-statistical basis. For the athelete catching a ball, or the scientist designing the experiment, we're interested only in those causal cases. For sure P(A|B) is a (approximate, statistical) model of P(A| B->A) -- but it's a very restricted, limited model. The athlete needs to estimate P(ball-stops | catch -> ball-stops) NOT P(ball-stops | catch) which is just any case of the ball-stopping given any case of catching. |
Of course, this is a rather poor excuse for an explanation. The fact that "outside" and "raining" are close doesn't explain why "my shoes are wet". But it does get us closer to a genuine explanation in the sense that it eliminates a class of wrong possibilities from consideration: every sentence that doesn't have outside in proximity to raining downranks the generation "my shoes are wet". The model is further improved by adding more inductive relationships of this sort. For example, the presence of an expanded umbrella downranks ShoesWet, the presence of "stepped in puddle" upranks it. Construct about a billion of these kinds of inductive relationships, and you end up with something analogous to an explanatory model. The structural relationships encoded in the many attention matrices in modern LLMs in aggregate entail the explanatory relationships needed for causal modelling.