| HN Mirror

Yes I think this actually could work. Community Notes has a basic reputation system: users need to "Earn In" by rating notes as "Helpful" that are ultimately classified by the algorithm as helpful. Once enough attackers earn in, they can totally break the algorithm.

Breaking it is not as simple upvoting a lot of, say, right-wing or left-wing posts though. The algorithm will simply classify all the attackers as having a very positive or negative polarization factor, and decide that their votes can be explained by this factor.

What would work is upvoting *unhelpful* posts. I have actually simulated this attack using synthetic data and sure enough it totally breaks the algorithm. I write about it in this article: https://jonathanwarden.com/improving-bridge-based-ranking/