Hacker News new | ask | show | jobs
by PeterStuer 894 days ago
At first glance it feels like the most effective way to game this system is to grind user credit through aggregate low polarization support on fairly neutral low impact posts, then strategically 'spend' on higher profile polerizing posts. Is that a fair 'red teaming' observation?
1 comments

Yes I think this actually could work. Community Notes has a basic reputation system: users need to "Earn In" by rating notes as "Helpful" that are ultimately classified by the algorithm as helpful. Once enough attackers earn in, they can totally break the algorithm.

Breaking it is not as simple upvoting a lot of, say, right-wing or left-wing posts though. The algorithm will simply classify all the attackers as having a very positive or negative polarization factor, and decide that their votes can be explained by this factor.

What would work is upvoting *unhelpful* posts. I have actually simulated this attack using synthetic data and sure enough it totally breaks the algorithm. I write about it in this article: https://jonathanwarden.com/improving-bridge-based-ranking/