|
|
|
|
|
by barneso
4132 days ago
|
|
Two simple things you could do: 1. Insert each negative example six times into your training set (or weight negative examples accordingly, ie use #positive matches - 6 * #negative matches / (2 * positive word count) as your score 2. Take your distribution of sentiment scores as calculated over held out data (or the training set itself, but be warned that this will skew your results), and calculate the mean and standard deviation. Normalize your results by subtracting the mean and dividing by the standard deviation. You can then say that positive sentiment is > 0 and negative sentiment < 0, with the absolute value being the strength of the classification. |
|
I think you mean to upweight my positive list by 6 (since it is 1/6 of the size of the negative list) but the problem with this is the same as my reply to the other comment where you just shift the bias:
Consider the sentence: 'there are strong and weak divisions in company X's Europe operations'
The only word matches in your word lists are 'strong' on your positive list and 'weak' on your negative list.
If you weight these counts as you describe, your sentiment for this sentence will be -0.44 + 1 = 0.66 even though the sentence is clearly 'neutral' and should have a score of 0.