Hacker News new | ask | show | jobs
by bonoboTP 2600 days ago
A random variable is different concept from a distribution. For me personally it is helpful to keep them separate, but I can see that others may not care about the complete conceptual picture.

In the PDF file linked above I can see conditional probabilities, conditional distributions and conditional expectation etc, which are all valid and rigorous. I can see that the author thinks it's a good idea to merge these into a single concept of conditional random variable for didactic reasons, but that's not a rigorous concept.

Practically, if you have two random variables then you can take their joint distribution. What would be the joint distribution of (A|B) and (C|D)? For actual random variables it's simple: you can take intersections in event space, but a "conditional random variable" does not correspond to any subset of the event space.

Very simply speaking (this is my working model, not the exact precise math definition which involves a lot of measure theory): in probability theory we have an event space containing atomic events that cover all possible outcomes for the whole experiment/observation. A random variable is a function that maps from each such potential (atomic) event to a number. That's right. The random variable is a function but not the mass function, which maps from a number to a probability.

Conditional probability P(A|B) is an expression defined to mean P(A,B)/P(B). That's a clear definition. I am yet to see the actual definition of a conditional random variable.

Again, disclaimer 1: I can see the practicality of disregarding formality. Still I argue this is best done only when you do know better but it would be tedious to be technically correct all the time. But as a beginner I find it more useful to keep track of the correct concepts. For example not distinguishing random variables and distributions can be very confusing when considering more advanced things, like mutual information and KL-divergence. The former operates on random variables, the latter on distributions. I remember this was a difficult realization for me because the material we used didn't emphasize the difference enough, probably in the name of practicality.

Disclaimer 2: my point is a minor one overall.

2 comments

> Practically, if you have two random variables then you can take their joint distribution.

If they are defined in the same sample space.

> a "conditional random variable" does not correspond to any subset of the event space

I would say it's exactly the other way around, the domain of a "conditional random variable" is a subset of the domain of the "unconditioned" random variable (the subset where the conditioning holds).

I think it will help if you think in terms of conditioning on (for example, a coarser sigma algebra). You would get another random variable that is measurable on the sigma algebra you conditioned on. If that is coarser so would be the new function you obtained by conditioning.
Let's talk about a fair dice roll to make it concrete, and let the rolled number be X and let the event that we rolled an even number be E. P(X=6|E) = 1/3. P(X|E) is a distribution where 1,3,5 has 0 probability mass and 2,4,6 have 1/3 each.

If we consider X|E as a random variable, what is its value if we roll an odd number? Undefined? What does that mean? Random variables always have some value.

Sure you can build a new event space (sigma algebra) but then you can't use random variables over the original one.

Let's consider two independent rolls, X and Y. You can't compute the joint distribution P(Y, (X|E)), it just doesn't make sense as the two "variables" are defined over different spaces. Note that this is not the same as P(X,Y | E). The latter is simple a conditional probability, without any concept "conditional random variables".

Again, this is totally obvious to people who have experience with probabilities, but could be confusing to students. Such cases are where students who try to understand the details may be left more confused than students who just want to get the main idea.

Sure you can. The TLDR would be "piecewise constant projection"

I think picking up a standard graduate probability book will clear this up better than any long comment trail. There are no problems defining a coarser sigma algebra using an original one and then defining a function measurable on the new sigma algebra. Note this continues to be an r.v. in the original space as meaurability is preserved. A consistent definition the values of the conditioned r.v. would be the piecewise constant approximation of the original r.v. over the indivisible elements of the coarser sigma algebra.

Let me try another route.

You seem to be accepting of a conditional expectation. Now what is a conditional expectation if not a function. Now all we need is that function be measurable with respect to the new sigma algebra, thats ensured byconstruction. Hope it helped some

> I think picking up a standard graduate probability book will clear this up better than any long comment trail.

Can you recommend one? I just picked up Probability and Measure by Billingsley and it does not mention "conditional random variable" a single time in over 600 pages. It does have a lot of "conditional probability", "conditional distribution", "conditional expectation" etc.

> You seem to be accepting of a conditional expectation.

Conditional expectation is defined in terms of conditional probabilities, and those are in turn explicitly defined as P(A|B)=P(A,B)/P(B), so there's nothing not to accept.

Billingsley is pretty darn good. It might have left the connection as a dotted line given that the notion is no different from conditional expectation. The only connection you have to make is conditional expectation is a function and a random variable. You must have seen expectation taken of a conditional expectation. That should should convince you that condititional expectation is indeed a random variable. Since that r.v. was obtained by conditioning its not a stretvh to call it a conditioned r.v.

Any book that explains conditioning over a sigma algebra should suffice. You could try Loeve, Dudely or Neveu but dont remember if its mentioned explicitly.

BTW conditional expectation is really more fundamental than conditional probability. Its the former that yields the latter in measure theoretic probability. If you want to drink from the source that would be Kolmogorov.

Finally if you are reading Billingsley you are adequately qualified to call yourself a mathematician.

It's getting a little tedious. Please show me a concrete citation of a serious textbook (not a tutorial/handout by a grad student or a paper by a random researcher) that puts the three words "conditional random variable" next to each other (consistently, not simply as a one-off potential mistake). Google doesn't show serious sources for it.

While I agree with isolated points of your comment I think it doesn't add up to a useful/coherent concept of conditional random variable.

> If we consider X|E as a random variable, what is its value if we roll an odd number? Undefined? What does that mean? Random variables always have some value.

Random variables have some value on their domain, and for the random variable X | E=1 the sample space is restricted to the elementary events {2,4,6} which conform the composite event E=1. The original sample space is partitioned in the subspaces {1,3,5} and {2,4,6} when we condition on the values of the random variable E (0:odd, 1: even).

> Sure you can build a new event space (sigma algebra) but then you can't use random variables over the original one.

I guess we all agree then.

> Let's consider two independent rolls, X and Y. You can't compute the joint distribution P(Y, (X|E)), it just doesn't make sense as the two "variables" are defined over different spaces.

The variables X and Y describing independent rolls are also defined over different spaces and to have a joint distribution you have to define a "common" sample space of the form {x=1,y=1},{x=2,y=1},..,{x=6,y=6}.

You could do the same for a roll of a dice and the toss of a coin. Or do you think that computing the joint distribution of a coin toss and a dice roll doesn't make sense because they are defined over different spaces?

> You could do the same for a roll of a dice and the toss of a coin. Or do you think that computing the joint distribution of a coin toss and a dice roll doesn't make sense because they are defined over different spaces?

Of course it doesn't! You first have to define them on a common space (the Cartesian product), and for that you have to specify their joint probabilities. One example might be that you model them as independent. Otherwise we wouldn't know how the coin and the dice relate. Sure independence is usually a good default assumption, but it's still a necessary step.

What did you mean with the following paragraph then?

> Let's consider two independent rolls, X and Y. You can't compute the joint distribution P(Y, (X|E)), it just doesn't make sense as the two "variables" are defined over different spaces.

Do you agree that you cannot compute the joint distribution P(Y,X) either because the two variables are defined over different spaces?

I meant them to be defined on the same space. It's a single experiment, the outcome of which are two rolls that happen to be independent.