|
|
|
|
|
by imh
3246 days ago
|
|
Not true. Let's derive it. Here's NB for a binary classification: p(c|w)*p(w) = p(w|c) * p(c)
p(c|w) = p(w|c) * p(c) / p(w)
p(c|w) = p(w|c) * p(c) / [sum_i p(w|c_i) * p(c_i)]
Let's look at the probability of class 1. p(c_1|w) = p(w|c_1) * p(c_1) / [sum_i p(w|c_i) * p(c_i)]
Notice how the numerator is going to show up in the denominator. We can simplify that by bringing it into the denominator: p(c_1|w) = 1 / [sum_i p(w|c_i) * p(c_i) / p(w|c_1) / p(c_1]
Then cancel it out: p(c_1|w) = 1 / [1 + p(w|c_0) / p(w|c_1) * p(c_0) / p(c_1)]
Not let's apply the NB assumptions: p(w|c_0) / p(w|c_1) = prod_i p(w_i | c_0) / p(w_i | c_1)
Now, if you take the log of the final p(c_1|w) I derived, the product in p(w|c_0) / p(w|c_1) turns into a exponentiated sum, giving you one parameter per word, plus an intercept for the log of p(c_0) / p(c_1). You end up with exactly the same 1/(1+exp(linear stuff)) with the same parametrization and form you have in logistic regression [0].This is a broader thing that a given graphical model with a given fixed parametrization can be generatively or discriminatively trained. They will end up learning different models, but that's because they make different assumptions, not because of different parametrizations. [0] linear stuff = intercept + sum_i (coefficient of word_i) |
|