|
|
|
|
|
by datastoat
2157 days ago
|
|
Probability notation used in ML and engineering has this problem, of overloading p(). Probability notation as used by probabilists in maths departments is completely different: it’s more explicit, and sometimes more clunky. There’s a hybrid notation that I prefer, for example “Pr_X(x)” for the density function of random variable X at point x; you drop X if the random variable is clear from the context, and you drop x if you’re referring to the entire distribution. Or Pr_X(x|Y=y) for a conditional density. But this notation still has problems when you’re working with hairier conditional distributions, or with distributions that are neither discrete nor continuous. (Source: used to be a mathematical probabilistic, now working in ML.) |
|
It's definitely worthwhile everyone using the full notation at least once so they can get a feel for what's really going on. I've spoken to Bayesian ML professionals who are especially unconfortable with that because it conditions on a zero-probability event (if Y is continuous)... of course p(x|y) does too, they just weren't thinking about it before! And (as I think you're getting at) the appreviated p(x|y) simply throws away information e.g. there's no way to represent the identity p_Y(x)=p_X(x) without adding back some sort of subscript.
But on the other hand p(x|y) is obviously much visually cleaner. If you're writing out a more complex identity and the abbreviated notation isn't ambiguous then it generally communicates the idea much more clearly because there's so much less visual noise.