|
|
|
|
|
by ai_maker
4135 days ago
|
|
Do you have the gold standard labels of your dataset? Can you ensure that the amount of pos/neg labels is symmetrical? You can heuristically tune the weights of your lexicon to fit your intuition, but evidence is necessary to progress adequately. In case you find unbalanced amount of examples, apply an unbalanced effectiveness score like the F-measure to obtain a fair performance of your system. |
|
I don't think asymmetrical test-sets would be a problem if I had training data for documents since you can reweight to compensate - it would seem my problem is that over-representing the universe of matches for negative points due to a bigger 'negative word list' is introcucing bias and I'm not sure how to solve that.
Please see my reply on reweighting in this thread (if you reweight positive words to normalize the over-represented negative word count then a neutral sentence will have a positive sentiment score)