|
|
|
|
|
by abhgh
3248 days ago
|
|
Oh, I see it now. Thanks for clarifying! In that case, you needn't have kept the denominator around at all. Since P(w) is not based on a class i.e. its not class conditioned, the classifier could have directly calculated P(C_1|w)/P(C_0|w). The P(w) term cancels out, and you end up with the product of ratios of feature probabilities conditioned on the classes. Note though that, for K-classes, K>2, the number of parameters you need to store would blow up. You would need have these N ratios: P(w_j|C_x)/P(w_j|C_y) for all possible classes x,y. Here N is the number of features. So, in all N*C(K,2) values. On the contrary, multiclass softmax regression (the discriminative analogue for NB) would need NK parameters. |
|
It really is just a special case of a more general rule that a given PGM with a fixed parametrization can be trained discriminatively or generatively.