Thanks for your comments! I don't see how what is discussed here conflicts with the notation I introduced into the post, do you still believe there is a soundness issue in what I have written?
I'm just saying that the notation of say A * X|Y * B seemed unfamiliar to me. I only know conditional notation within a P(...). Or an expectation, etc. Apparently your way of writing is used by others as well, but it may be good to know that it is not fully rigorous.
Again, there are different people preferring different presentations. I as a student was often frustrated by abused notations and was often confused by such things when trying to understand something in detail. For a more cursory and "practical" understanding it could be good enough.
> it may be good to know that it is not fully rigorous
What is the problem with A|B=b being a random variable? (Apart from you unfamiliarity with the concept, I mean.)
Edit: I don’t say there are no problems, I ask what do you think the problem is? There is no problem in the discrete case. In the continuous setting things are indeed more complicated (but if the limiting process is well defined there are no issues).
Note that the same lack of rigour that you find in conditional ramdom variables affects conditional probabilities. If you can accept the latter there is no reason to reject the former.
A random variable is different concept from a distribution. For me personally it is helpful to keep them separate, but I can see that others may not care about the complete conceptual picture.
In the PDF file linked above I can see conditional probabilities, conditional distributions and conditional expectation etc, which are all valid and rigorous. I can see that the author thinks it's a good idea to merge these into a single concept of conditional random variable for didactic reasons, but that's not a rigorous concept.
Practically, if you have two random variables then you can take their joint distribution. What would be the joint distribution of (A|B) and (C|D)? For actual random variables it's simple: you can take intersections in event space, but a "conditional random variable" does not correspond to any subset of the event space.
Very simply speaking (this is my working model, not the exact precise math definition which involves a lot of measure theory): in probability theory we have an event space containing atomic events that cover all possible outcomes for the whole experiment/observation. A random variable is a function that maps from each such potential (atomic) event to a number. That's right. The random variable is a function but not the mass function, which maps from a number to a probability.
Conditional probability P(A|B) is an expression defined to mean P(A,B)/P(B). That's a clear definition. I am yet to see the actual definition of a conditional random variable.
Again, disclaimer 1: I can see the practicality of disregarding formality. Still I argue this is best done only when you do know better but it would be tedious to be technically correct all the time. But as a beginner I find it more useful to keep track of the correct concepts. For example not distinguishing random variables and distributions can be very confusing when considering more advanced things, like mutual information and KL-divergence. The former operates on random variables, the latter on distributions. I remember this was a difficult realization for me because the material we used didn't emphasize the difference enough, probably in the name of practicality.
> Practically, if you have two random variables then you can take their joint distribution.
If they are defined in the same sample space.
> a "conditional random variable" does not correspond to any subset of the event space
I would say it's exactly the other way around, the domain of a "conditional random variable" is a subset of the domain of the "unconditioned" random variable (the subset where the conditioning holds).
I think it will help if you think in terms of conditioning on (for example, a coarser sigma algebra). You would get another random variable that is measurable on the sigma algebra you conditioned on. If that is coarser so would be the new function you obtained by conditioning.