| Cool solution. I like it. Is it helpful to first look at the names and sign-up times of a particular set of users, and then search for votes on common posts? This would result in a slightly different ratio: SUM Votes(U1, P) / Votes(Un, P) where U1 is a particular user, P is the post voted on by that user, and Un is the rest of the users up to n total users. The reason this occurs to me is because you can still make this run more efficiently by limiting the number of users you examine (as opposed to running across only certain posts - should be the same number of queries for a particular number of either users or posts), and it would allow you to start the top of the detection funnel on heuristics around obviously fake IDs or correlated sign-up times. This might help get around vote bots that set up fake accounts and all vote for the same posts, but also vote randomly for at a certain frequency for other posts (which would not be differentiated in the first algorithm from a true vote ring versus voters with similar trends in taste, such as the effect observed on pinterest). Anyway, just thinking out loud. Or whatever the typing equivalent of that is. |