Wow, I did not expect to see David's notation here on HN. The only problem with the notation is that it becomes so second nature that you forget it's not standard!
A lot is lost here by using notation that looks like it is rigorous math, but is actually pretty vague. For example, are X and Y indicators for the same flip? If so, they are mutually exclusive, X=Y is contradictory, and hence P(X=Y)=0. If they are samples from different flips (and your coin is the usual idealized one) then X and Y are independent random variables and P(X=Y)=0.25.
It's just like if X~N(0,1), Y~N(0,1) and you want to know the distribution of X-Y. You need to know what the PDF of (X,Y) looks like. Well, you don't know. X and Y could be correlated or they might not be. e.g. if could be that (X,Y)~N( (0,0), [(1,0),(0,1)] ) or maybe (X,Y)~N( (0,0), [(1,1/2),(1/2,1)] ). The distribution of X-Y cares how correlated X and Y are.
- You literally identify sets with their indicators: they are the same.
- You identify the “P” operator as expectation (integration) with respect to the underlying measure, and next…
- You note that integration is linear, so you use linear operator notation everywhere you’d use P.
So if “A” is a set, you just write
P A = 0.5
This is equivalent to:
P A = P 1[ω ∈ A] = ∫ 1[ω ∈ A] dP(ω)
in Lebesgue notation.
There’s an example on page 2 of http://www.stat.yale.edu/~pollard/Courses/241.fall2014/notes... although he’s not using measure theory there.
It can be really clean and terse especially when doing bounds for random variables.