Hacker News new | ask | show | jobs
by lonk11 1164 days ago
1. The article suggests running PageRank on a graph of users as nodes and edges as - "has user A upvoted a comment of user B" as an edge:

""" Since it's likely that one user may upvote multiple comments from the same user, we check whether a user has already upvoted a comment from that specific user before considering their upvote. In other words, we treat user profiles as nodes and upvotes for comments as edges. """

This is a very lossy conversion of the actual data:

a. It does not distinguish if user C and D upvoted the same comment of user B or not. Maybe one comment was good and the other was bad. But when you convert it to the above graph you only get that C and D upvoted some comment of user B.

b. It does not account for the number of comments user B left - 1000 or 5? This incentivizes spam because there is no upside to not posting.

c. It ignores users that do not comment but upvote valuable comments themselves. Then the PageRank is used to weigh upvotes of users. This is backwards. The PageRank values should capture the value of each user's past upvotes in order to use it as a prediction of how valuable their future upvotes will be. But the suggested algorithm uses the value of the user's past comments as a weight of their future upvotes.

To fix this I think the graph needs to be changed to a bipartite graph of users and comments as nodes and upvotes and flags as directed edges (when the author posts a comment - this should be represented as an implicit upvote of the comment). Then you can calculate how valuable each user's upvotes (and flags) are.

2. The "gameability" of PageRank stems from the fact that the random walk algorithm treats each users equally as a starting point. It means that you can create a ton of fake users and upvote the comments of a target user you want to artificially boost in upvote-power. Each time the random walk starts at one of those fake users the walk will end up in the target user - increasing their PageRank score.

My proposal to solve the "gameability" problem is to start each walk from you - the user that views HackerNews. It means that your past upvotes become the starting step of the random walk and so the resulting PageRank will be personalized for you. Instead of a single PageRank reputation score (which captures how user A's contributions to HN have been to all users), there is a set of personalized scores that capture how useful other users have been to you.

I'm building https://linklonk.com which uses this kind of algorithm to rank both links and comments. The details of the ranking algorithm are here: https://linklonk.com/item/3292763817660940288