Hacker News new | ask | show | jobs
by datastoat 2157 days ago
Probability notation used in ML and engineering has this problem, of overloading p(). Probability notation as used by probabilists in maths departments is completely different: it’s more explicit, and sometimes more clunky.

There’s a hybrid notation that I prefer, for example “Pr_X(x)” for the density function of random variable X at point x; you drop X if the random variable is clear from the context, and you drop x if you’re referring to the entire distribution. Or Pr_X(x|Y=y) for a conditional density. But this notation still has problems when you’re working with hairier conditional distributions, or with distributions that are neither discrete nor continuous.

(Source: used to be a mathematical probabilistic, now working in ML.)

1 comments

I used to hate the way Bayesian ML people used p(...), until I realised that strictly speaking for a conditional variable we ought to be writing: p_X|Y=y(x). The variable is X|Y=y so all that ought to be in the subscript.

It's definitely worthwhile everyone using the full notation at least once so they can get a feel for what's really going on. I've spoken to Bayesian ML professionals who are especially unconfortable with that because it conditions on a zero-probability event (if Y is continuous)... of course p(x|y) does too, they just weren't thinking about it before! And (as I think you're getting at) the appreviated p(x|y) simply throws away information e.g. there's no way to represent the identity p_Y(x)=p_X(x) without adding back some sort of subscript.

But on the other hand p(x|y) is obviously much visually cleaner. If you're writing out a more complex identity and the abbreviated notation isn't ambiguous then it generally communicates the idea much more clearly because there's so much less visual noise.