| HN Mirror

This is part of my problem - I don't have a labeled dataset outside of my 'positive words' / 'negative words' lists.

I don't think asymmetrical test-sets would be a problem if I had training data for documents since you can reweight to compensate - it would seem my problem is that over-representing the universe of matches for negative points due to a bigger 'negative word list' is introcucing bias and I'm not sure how to solve that.

Please see my reply on reweighting in this thread (if you reweight positive words to normalize the over-represented negative word count then a neutral sentence will have a positive sentiment score)