| The greatest weakness in the scoring system [0] that I can see is age. There is a requirement for valid scoring to occur within 48 hours. > Made within the first 48 hours of the note’s creation (because we publicly release all rating data after 48 hours) [1] However, in the real world, our understanding of a message's context may actually take much longer than that. Especially when more information can come to light, that changes the landscape. The second greatest weakness I see is that rater's with a lower mean are automatically filtered. Whilst you can discuss using APIs to do it, if you have large groups of individuals dedicated to promoting specific viewpoints, you can utilise that manpower to de-rate anyone promoting an opposing view by ruining their helpfulness average. That makes the system easily abused by highly motivated political factions, especially foreign ones that admit to employing large groups of people for such a purpose. > Their rater helpfulness score must be at least 0.66 [1] [0] https://github.com/twitter/birdwatch/blob/main/static/source... [1] https://twitter.github.io/birdwatch/contributor-scores/#vali... |
This is a good thing. The rater helpfulness score is how similar you rate a note as helpful/not helpful to how that note eventually is labeled. Because this determination is made based on how well it's rated among those with differing opinions, being accurate means your ratings tend to be less biased. Other accounts aren't voting on your "rater helpfulness score," so it's not subject to brigades.
The 48 hour thing is only for valid ratings, and that's only for the rating helpfulness score, so it's not to do with note ratings. Correct me if I'm wrong, but it looks like they were careful about the nuances that you've mentioned.