Hacker News new | ask | show | jobs
by vermarish 857 days ago
Right. But if you make the notation slightly more explicit, then the integral of L(data, params) over data is 1. This follows from the independence assumption.

So we ARE working with a probability function. Its output can be interpreted as probabilities. It's just that we're maximizing L = P(events | params) with respect to params.

2 comments

The likelihood function is a function of params for a fixed value of data and it is not a probability function.

There is another function - a function of data for fixed params - which is a probability density. That doesn’t change the fact that the likelihood function isn’t.

The independence has nothing do with the integral being 1 to be honest. You could write a model where the observations are not independent but the (multivariate) integral over their domain will still be 1.
But for such a model, the joint pdf would not be written simply as a product of each individual pdf. That's what independence provides.
If by “joint probability” you mean function(params, data) there is no joint probability here in general.

L(params, data) is constructed from a family density functions p(data) for each possible value of param. The integral of L(params, data) may be anything or diverge. You don’t need any extra independence assumption either.

Or maybe you mean “joint probability” as p(data1, data2) when data is composed of two observations, for example. But you don’t need any independence assumption for that probability density to integrate to one! It necessarily does that - whether you can factorize it as p’(data1)p’’(data2) or not.