The GPT family of models (and all neural networks for that matter) can estimate P(X | Y), but have no way of computing whether X -> Y or X <- Y.