|
|
|
|
|
by eob
1024 days ago
|
|
As for practical use cases, one is to find an approximate optimization to a function - You want to find the min/max of some probability distribution P(x) - P(x) is too complicated to find a closed-form min, but you can draw samples from it. - So instead, you carefully construct some OTHER probability distribution Q(x|θ) that you claim is structurally similar "enough" to P(x), parameterized by θ. - Now you find the theta which minimizes the KL divergence KL(P(x) || Q(x|θ)), which is equivalent to delivering you the parameters of θ to Q(x|θ) that make it [approximately] "most" similar to P(x) without ever having minimized P(x) It was a trick that came up a lot when AI consisted of giant Bayesian plate models for each specific task that you had to hand-optimize. |
|
You can form the 'empirical' probability distribution P'(x) from your n training samples {x_i}, with P'(x_i) = 1/n and P'(x) = 0 for all other x.
Then finding the θ which minimizes KL(P'(x) ∥ Q(x|θ)) is equivalent to finding the maximum likelihood estimate (MLE) given your training data.
(Note: I don't know what's meant by "the min/max of some probability distribution P(x)" and suggest ignoring that)