Hacker News new | ask | show | jobs
by option1138 5360 days ago
Hi Matt,

We don't know each other but I think we know of each other. I'm rather immersed in webspam detection and found this incredibly interesting.

You imply that the challenge is finding a solution that scales. Yet it sounds to me from your response that this site was flagged via manual review. Did I misunderstand?

If I heard you correctly, then is manual review a significant equation in the webspam detection methodology? You guys are boiling the ocean so I find that rather hard to swallow.

The more likely conclusion I can draw is that he had a significant number of (auto-generated) pages on his site flagged as spam and that in turn raised some eyebrows.

BTW, you and your team are doing some amazing work. I wish the paid side was up to the standards you set.

1 comments

The site was flagged both algorithmically and also escalated to a member of the manual webspam team.

The basic philosophy is to do as much as we can algorithmically, but there will always be a residual of hard cases that computers might not do as well at (e.g. spotting hacked sites and identifying the parts of a site that have been hacked). That's where the manual webspam team really adds a lot of value.

In addition to things like removing sites, the data from the manual webspam team is also used to train the next generations of our algorithms. For example, the hacked site data that our manual team produced not only helped webmasters directly, we also used that data to produce an automatic hacked site detector.

If you're interested, I made a video about the interaction between algorithmic and manual spamfighting here: http://www.youtube.com/watch?v=ES01L4xjSXE