| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by mathisd 855 days ago
	I don't understand why maximum of likelyhood is not zero in the example given. Isn't P(X = x / theta = theta_0) always null for continuous laws ?

1 comments

knightoffaith 855 days ago

The actual probability is 0, but the probability density is not 0. Same reason why the probability that I pick 0.5 from a uniform distribution from 0 to 1 is 0, but the value of the probability density function of the distribution at 0.5 is 1.

link

cubefox 855 days ago

What is this point value then measuring? A literal "density" doesn't seem plausible either, as points arguably do not have any "density".

link

knightoffaith 855 days ago

I'll give the mathematical explanation. So if X is a continuous random variable, the probability that X takes on any particular value x is 0, i.e. P(X = x) = 0. However, it still makes sense to talk about P(X < x) --- this is clearly not 0. For example, suppose X is a random variable of the uniform distribution from 0 to 1. P(X = 0.5) = 0, clearly, but P(X < 0.5) = 0.5, clearly. (There's a 50% chance that X takes on a value less than 0.5). We can talk about P(X < x) as a function of x---in the case of the uniform distribution, P(X < x) = x. (There's a 30% chance that X takes on a value less than 0.3, there's a 80% chance that X takes on a value less than 0.8, etc.) This is called the cumulative distribution function---it tells us the cumulative probability (accumulating from -infinity to x). The probability density function is the rate of change---the derivative---of the cumulative distribution function. At a particular x, how "quickly" is the cumulative distribution function increasing at that point? That is the question that the probability density function answers, if that makes sense.

In the case of the cumulative distribution function of the uniform distribution from 0 to 1, since the derivative of x is 1, the probability distribution function is 1 from 0 to 1 and 0 elsewhere. This makes sense; the probability P(X < x) isn't increasing faster at one point than any other---with the exception of x outside of 0 and 1 having a probability density value of 0, since e.g. P(X < 2) is 100% and increasing the value of x=2 does not change this (it's still 100% because X only takes on values within [0,1]) .

link

cubefox 854 days ago

That's interesting and intuitive for a uniform distribution. What does it then mean on a non-uniform distribution for an value to be very small? Is there some interpretation for that? The Stack Overflow post actually mentions values that are extremely close to zero.

link

knightoffaith 853 days ago

So, just to be sure, even for a uniform distribution, the values can be small. Consider the uniform distribution from 0 to 10^100. The CDF for this distribution is P(X < x) = x/10^100. The derivative of this (the PDF) is p(x) = 1/10^100. At any particular point, p(x) is 1/10^100. But this is true for any x (again, unless it is outside the range [0, 10^100]), which makes sense because the "speed" with which the probability is increasing is constant regardless of the x. Why are these values smaller than for the uniform distribution on [0,1]? It's because the probability increases much more slowly per unit of x on the uniform distribution from [0, 10^100] than it is on the uniform distribution from [0, 1]. P(X < 0) to P(X < 1) for Uniform(0, 10^100) only increases the probability by 1/10^100, while it increases the probability by 1 for Uniform(0, 1).

So PDFs can have small values regardless of whether they are uniform or not. What a small PDF at a point x indicates is that the CDF is increasing very "slowly" at that x. I'll emphasize this point - PDF values are not probabilities. They are rates of change of the CDF.

For some further understanding of the stack overflow post, let's consider Uniform(0, 2). The PDF is p(x) = 1/2. Suppose the author of the stack overflow post drew 50 samples from this distribution. Regardless of what those 50 samples were, the value he would have gotten would have been (1/2)^50 = 1/(2^50), something on the order of 10^-16. Why is this so small?

(I'll give a rather loose and informal explanation here, but I can be more formal if you'd like, if this doesn't make sense.) Think back to Uniform(0, 1) vs. Uniform(0, 10^100). Recall that the probability that a particular x falls in [0, 1] for the former distribution is the same as the probability that a particular x falls in [0, 10^100]---i.e. 1 (100%). In the case of the latter distribution, that 1 has had to be "spread out" across a larger space, which should give some intuition as to why the PDF is low---for a particular unit in space that we "travel", since the probability has been spread out so thinly across the space, the CDF isn't increasing that much, i.e. the PDF isn't that high.

When we're looking at PDF values when we're looking at the space of possibilities covered by 50 samples, it's going to be a lot "larger" than the space covered by 1 sample (over one sample, the space is [0,2], covering 2 units of space. over two samples, the space is the square [0,2] x [0,2], with an area of 4. over 50 samples, the space is the hypercube [0,2]^50, with a 50-dimensional volume of 2^50---a huge space.) But the total probability is still 1, so it's going to be "spread out" very thinly across this larger space, hence much smaller values. And so, the probability we accumulate as we move across this space per unit is going to be very low, hence a low likelihood value.

So when we draw many samples from a distribution, the likelihood of these samples is going to be very small (mostly---there might be spikes where they're high).

I've spoken a little loosely and informally, but hopefully this makes sense.

link

cubefox 852 days ago

I just don't quite understand why more samples mean that the "space" gets higher dimensional and consequently less dense. Aren't the samples just estimating the underlying PDF, such that more samples shouldn't decrease the magnitude of the PDF? So if he drew those samples from Uniform(0, 2), shouldn't the resulting PDF simply approximate a value of 1/2=0.5 everywhere? I'm probably misunderstanding something basic here.

link