|
|
|
|
|
by ANaimi
3980 days ago
|
|
The formula/collusion ratio is exponential, but not factorial. The implementation however is very efficient: instead of applying the algorithm on the complete dataset of users, we apply it to each group of users within a post. This drastically reduces the running time. The implementation goes over every post and computes the ratio for voters within that post. It then removes one user from that group and recalculates the ratio. If the ratio drops, it brings that user back in. If it increases, it keeps them out. You can check the implementation here (click on Edit Algorithm):
https://algorithmia.com/algorithms/ANaimi/SimpleVoteRingDete... Running SimpleVoteRingDetection on the complete Product Hunt dataset (16k+ posts, 52k+ users) takes a few seconds. If you have a dataset for any other website/application, you can easily feed it into the algorithm and experiment with that. |
|
Is it helpful to first look at the names and sign-up times of a particular set of users, and then search for votes on common posts? This would result in a slightly different ratio:
SUM Votes(U1, P) / Votes(Un, P)
where U1 is a particular user, P is the post voted on by that user, and Un is the rest of the users up to n total users.
The reason this occurs to me is because you can still make this run more efficiently by limiting the number of users you examine (as opposed to running across only certain posts - should be the same number of queries for a particular number of either users or posts), and it would allow you to start the top of the detection funnel on heuristics around obviously fake IDs or correlated sign-up times.
This might help get around vote bots that set up fake accounts and all vote for the same posts, but also vote randomly for at a certain frequency for other posts (which would not be differentiated in the first algorithm from a true vote ring versus voters with similar trends in taste, such as the effect observed on pinterest).
Anyway, just thinking out loud. Or whatever the typing equivalent of that is.