| That would be an exceedingly thin signal. Very few posts will have incoming links, so you'll get very little scoring data. The signal will be hugely susceptible to outlier bias -- The Post That Goes Viral, and generates a huge number of incoming links -- will dominate the rankings. Because the signal is so thin, distorting and manipulation (through link farming) will be cheap and difficult to detect (a small number of links across a large number of sites). Don't get me wrong: looking at incidental behaviour is useful, and can often be much more beneficial than direct actions. But remember that all of these signals are actually proxies for some ineffible quantity you're trying to measure, quality. (The very definition of which should leave you crying on the floor after a few hours. Or days. Or weeks. Or months. Or years....) May minds have attempted this task. All have fail. Your correspondent included. (Small site, many moons ago, since surrendered its electrons back to the Great Disk in the Sky.) |
The problem is essentially this: since pagerank is basically the probability that a random walk through the link graph will end up on your site, linking back to yourself, and no other websites, gives a big boost to your pagerank because a random walk will get stuck on your site. Of course it’s easy to just ignore self-links, but you can get essentially the same effect through clique-like groups of websites and this can be more difficult to detect.
What’s interesting is that an algorithm based on how electrical current flows (so a link is a one-way resistor, i.e. a resistor in series with a diode) would not have this problem. Attaching a conductive loop to some point in a circuit does not change how current flows. Electrons don’t get stuck in loops because they don’t drift around randomly, they move from lower voltage to higher voltage.