|
|
|
|
|
by TwoFx
3344 days ago
|
|
From the paper: "To investigate the noisiness of using Reddit as a source of self-annotated sarcasm we estimate the proportion of false positives and false negatives induced by our filtering. This is done by having three human evaluators manually check a random subset of 500 comments from SARC-main tagged as sarcastic and 500 tagged as non-sarcastic, with full access to the comment’s context. A comment was labeled a false positive if a majority determined that the “/s” tag was not an annotation but part of the sentence and a false negative if a majority determined that the comment author was clearly being sarcastic. After evaluation, the false positive rate was determined to be 2.0% and the false negative rate 3.0%. Although the false positive rate is reasonable, the false negative rate is significant compared to the sarcasm proportion, indicating large variation in the working definition of sarcasm and the need for methods that can handle noisy data in the unbalanced setting." |
|
Seems from what was said above that this is something that has not been taken into account.