| HN Mirror

It only tends to a Normal distribution if you estimate P(negative|matches in -ve list) & P(positive|matches in +ve list) with an unbiased, consistent estimator.

A simple 1-gram model like in the question does not model many complexities of natural language e.g. negation ("not bad" != "bad") so you would expect your estimator to over-represent the dictionary with more words that are equal to their adverb-adjusted equivalent. e.g. "not bad" can be described as 'terrible' more readily than 'very good' can be described as excellent since people assign a hyperbolic weighting to their own happiness (utility theory 101)

The sentiment would only tend to a normal distribution if we had perfect estimators for document sentiment which requires advanced POS tagging and models more complex than a 1-gram bag of words aggregation :)