Hacker News new | ask | show | jobs
by versteegen 1024 days ago
Note that "drawing samples from P(x)" means to have training data drawn from P(x).

You can form the 'empirical' probability distribution P'(x) from your n training samples {x_i}, with P'(x_i) = 1/n and P'(x) = 0 for all other x.

Then finding the θ which minimizes KL(P'(x) ∥ Q(x|θ)) is equivalent to finding the maximum likelihood estimate (MLE) given your training data.

(Note: I don't know what's meant by "the min/max of some probability distribution P(x)" and suggest ignoring that)

1 comments

MLE | training data

Just writing hand wavily :)