|
So, just to be sure, even for a uniform distribution, the values can be small. Consider the uniform distribution from 0 to 10^100. The CDF for this distribution is P(X < x) = x/10^100. The derivative of this (the PDF) is p(x) = 1/10^100. At any particular point, p(x) is 1/10^100. But this is true for any x (again, unless it is outside the range [0, 10^100]), which makes sense because the "speed" with which the probability is increasing is constant regardless of the x. Why are these values smaller than for the uniform distribution on [0,1]? It's because the probability increases much more slowly per unit of x on the uniform distribution from [0, 10^100] than it is on the uniform distribution from [0, 1]. P(X < 0) to P(X < 1) for Uniform(0, 10^100) only increases the probability by 1/10^100, while it increases the probability by 1 for Uniform(0, 1). So PDFs can have small values regardless of whether they are uniform or not. What a small PDF at a point x indicates is that the CDF is increasing very "slowly" at that x. I'll emphasize this point - PDF values are not probabilities. They are rates of change of the CDF. For some further understanding of the stack overflow post, let's consider Uniform(0, 2). The PDF is p(x) = 1/2. Suppose the author of the stack overflow post drew 50 samples from this distribution. Regardless of what those 50 samples were, the value he would have gotten would have been (1/2)^50 = 1/(2^50), something on the order of 10^-16. Why is this so small? (I'll give a rather loose and informal explanation here, but I can be more formal if you'd like, if this doesn't make sense.) Think back to Uniform(0, 1) vs. Uniform(0, 10^100). Recall that the probability that a particular x falls in [0, 1] for the former distribution is the same as the probability that a particular x falls in [0, 10^100]---i.e. 1 (100%). In the case of the latter distribution, that 1 has had to be "spread out" across a larger space, which should give some intuition as to why the PDF is low---for a particular unit in space that we "travel", since the probability has been spread out so thinly across the space, the CDF isn't increasing that much, i.e. the PDF isn't that high. When we're looking at PDF values when we're looking at the space of possibilities covered by 50 samples, it's going to be a lot "larger" than the space covered by 1 sample (over one sample, the space is [0,2], covering 2 units of space. over two samples, the space is the square [0,2] x [0,2], with an area of 4. over 50 samples, the space is the hypercube [0,2]^50, with a 50-dimensional volume of 2^50---a huge space.) But the total probability is still 1, so it's going to be "spread out" very thinly across this larger space, hence much smaller values. And so, the probability we accumulate as we move across this space per unit is going to be very low, hence a low likelihood value. So when we draw many samples from a distribution, the likelihood of these samples is going to be very small (mostly---there might be spikes where they're high). I've spoken a little loosely and informally, but hopefully this makes sense. |