|
|
|
|
|
by imh
3251 days ago
|
|
Heh, nope you're right I was writing to quickly. I meant to say to take a log of of the probability term in p(c_1|w). So you take p(c_1|w) = 1/(1+stuff) and turn it into p(c_1|w) = 1/(1+exp(log(stuff))). stuff is a product, so log(stuff) is a sum. Good catch. |
|
In that case, you needn't have kept the denominator around at all. Since P(w) is not based on a class i.e. its not class conditioned, the classifier could have directly calculated P(C_1|w)/P(C_0|w). The P(w) term cancels out, and you end up with the product of ratios of feature probabilities conditioned on the classes.
Note though that, for K-classes, K>2, the number of parameters you need to store would blow up. You would need have these N ratios: P(w_j|C_x)/P(w_j|C_y) for all possible classes x,y. Here N is the number of features. So, in all N*C(K,2) values. On the contrary, multiclass softmax regression (the discriminative analogue for NB) would need NK parameters.