| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by hackinthebochs 1202 days ago

>This is not a philosophical matter. We know that "statistical learning" which is nothing more than a "correlation maximisation objective" over non-phenomenological, non-causal, non-physical data produces approximate associative models of those target domains -- that have little use beyond "replaying those associations".

Why do you think the data LLMs are trained on are non-causal? Lets take causation as asymmetric correlation. That is, (A,B) present in the training data does not imply (B,A) presence. But of course human text is asymmetric in this manner and LLMs will pick up on this asymmetry. You might say that causation isn't merely about asymmetric correlation, but that of the former determining the latter. But this isn't something we observe from nature, it is an explanatory posit that humans have landed on in service to modelling the world. So causation is intrinsically explanatory, and explanation is intrinsically causal. The question is, does an LLM in the course of modelling asymmetric correlations, develop something analogous to an explanatory model. I think so, in the sense that a good statistical model will intrinsically capture explanatory relations.

Cashing out explanation and explanatory model isn't easy. But as a first pass I can say that explanatory models capture intrinsic regularity of a target system such that the model has an analogical relationship with internal mechanisms in the target system. This means that certain transformations applied to the target system has a corresponding transformation in the model that identifies the same outcome. If we view phenomena in terms of mechanistic levels with the extrinsic observable properties as the top level and the internal mechanisms as lower levels, an explanatory model will model some lower mechanistic level and recover properties of the top level.

But this is in the solution space of good models of statistical regularity of an external system. To maximally predict the next token in a sequence just requires a model of the process that generates that sequence.

2 comments

mjburgess 1202 days ago

Well this is a good reply, but it's mistaken.

The conditional probability:

    P(x[0]| x[-1], x[-2], x[-3] ...)

is not the same as,

    P(x[0] | x[-1], x[-2], ... -> x[0])

Where `->` says we select only those cases where x[-1],... brought-about x[0].

To see why this is the case, suppose we do have a god's eye-view of all of spacetime.

    P(A|B) 
    always selects for all instances where B follows A.

    P(A| B -> A) 
    selects only those instances where B's following A was caused by A.

Eg.,

    P(ShoesWet | Raining) 

    is very different from 

    P(ShoesWet | Raining -> ShoesWet)

in the former case the two events have, in general, nothing to do with each other.

To select "Raining -> ShoesWet" even with a gods-eye-view we need more than statistics... since those events which count as "Rain -> ShoeWet" have to be selected on a non-statistical basis.

For the athelete catching a ball, or the scientist designing the experiment, we're interested only in those causal cases.

For sure P(A|B) is a (approximate, statistical) model of P(A| B->A) -- but it's a very restricted, limited model.

The athlete needs to estimate P(ball-stops | catch -> ball-stops)

NOT P(ball-stops | catch) which is just any case of the ball-stopping given any case of catching.