Hacker News new | ask | show | jobs
by itsprofitbaron 4561 days ago
Whilst this may be surprising to some people on HN, this happens all of the time to sites who build links in an unnatural way.

For instance, this has happened in the past to well known brands such as J.C Penney through NYTimes expose[1], Interflora[2] more recently and a lots of others.

An apology which RapGenius offered [3] doesn't fix this either.

Is it fair? Yes and No.

The only reason it isn't fair is that the site disappears from Google for the BRAND term e.g. [rap genius]. My personal belief is that, devaluing the site for the BRAND term e.g. [rap genius] actually devalues Google's quality. On the other side of the coin, if someone searches for [X rap genius] whilst they are under penalty its fair that they do not rank for that either. However, there are obvious reasons as to why the search quality team have decided to do this.

How RapGenius can fix it / How you can too if your site gets a penalty:

First of all, RapGenius if they are doing any link building now they should pause it immediately until they’re out of penalty.

Secondly, in their apology [3] said:

  "With limited tools (Open Site Explorer), we found some suspicious backlinks to some of our competitors"
They don't actually need to use any other tool to get out of penalty beyond Google Webmaster Tools although, ideally they should clean up all the links beyond the ones Google has found (trust me, Google doesn’t find them all within WMTs). Once you get out of a Google manual penalty and get hit by one again the search quality team takes a much closer look – you don’t want that!

Anyway, they should download all the links in WMTs, OSE, Majestic etc (although it looks like they only have OSE[3] so they should just download them from WMTs and OSE) and then remove the duplicates.

Once they’ve done this, they should flag every single link which, they believe is causing the penalty.

After identifying all the links which are causing the penalty, they should create a Gmail to outreach to all of the sites to remove the links. They should outreach to all these sites and documents all the sites they’ve contacted, status – still live/nofollow/removed/requested payment/no response etc.

Having got some links removed/nofollowed etc, they should then disavow all the other sites that have requested payment or not given them response to the removal. Personally, the disavow(s) that are done by myself are usually done at the domain level although, there are reasons to do this at the URL level as well (Rap Genius needs to make the decision which one to disavow).

After submitting the disavow they, should submit a reconsideration request which outlines, the work they have done – from the spreadsheet – and also offer Google’s Search Quality Team the login to the Gmail to show they’ve tried to get the links removed and that some people have asked for payment etc.

The Google Search Quality team will review the site then, they’ll either flag more links to be removed or they’ll get out of penalty – after which Rap Genius will start appearing for the BRAND term again and other results once the Google Algorithms trust the site again.

[1] http://www.nytimes.com/2011/02/13/business/13search.html?_r=...

[2] http://searchengineland.com/google-says-no-comment-on-why-in...

[3] http://news.rapgenius.com/Rap-genius-founders-open-letter-to...

11 comments

You seem to have a lot of experience with this kind of stuff.

Could you possibly give me a tldr on this story?

I recall the exposé talking about how they were trading twitter links for keyword link building. Yet I'm having a hard time working up the requisite outrage.

It's not spam - the people linking back aren't being coerced (i.e. spam comments), and the content it's being linked from is legit (i.e. not a crappy link-farm site void of content).

Is the reason that this is Bad(tm) because the link back is not "organic"? It strikes me as being identical to say, paying every a thousand bloggers writing about Bieber to link to RG - except that last example is impossible to detect. They exchanged a small ad for twitter inbound traffic.

Is there just a blanket ban on trying to divine how the algorithm works? It's a commonly accepted practice to pay other people to promote or write about your product/service/brand.

Essentially Rap Genius violated one of Google's Webmaster Guidelines[1] by attempting to manipulate the SERPs through getting webmasters to link to several lyrics pages in exchange for a tweet. In doing so, Google considers this as link scheme[2] which is trying to manipulate the results specifically in relation to Buying/Selling Links:

  Buying or selling links that pass PageRank. This includes exchanging money for links, or posts that contain links; exchanging goods or services for links; or sending someone a “free” product in exchange for them writing about it and including a link
There are several other ways to "manipulate" the SERPs this way including some of them which you have identified such as - spam comments, doorway pages, forum profiles amongst others - and Google has an algorithm codenamed Penguin which, detects and penalises webmasters who attempt to manipulate the search engines in such way (although it is more complicated than this).

However, Penguin is not the only way which Google identifies people being involved in these practices as they also have a place to report the links[3].

This is one of Google's Manual Actions[4] that webmasters receive, when Google believes you are not providing the user with additional value and/or are trying to manipulate the results.

They cover everything from Thin Content (mainly through Panda) to Hacked Sites to User Generated Spam to the recent Image Mismatch Penalty etc. You can see them all here: https://support.google.com/webmasters/topic/2604771?hl=en&re...

[1] https://support.google.com/webmasters/answer/35769

[2] https://support.google.com/webmasters/answer/66356

[3] https://www.google.com/webmasters/tools/paidlinks?pli=1&hl=e...

[4] https://support.google.com/webmasters/answer/2604824?hl=en

> ...by attempting to manipulate the SERPs through getting webmasters to link to several lyrics pages in exchange for a tweet.

How is that different from paying bloggers to write a product review with a link to a product? Why is not that considered a "manipulation of SERPs" - you exchange (money/tweet) for a link.

It's not different. As mentioned earlier, paying for links is against one of Google's Webmaster Guidelines[1] and your example specifically falls under the Link Scheme[2] category which is affecting Rap Genius.

Those types of links reviews for a link, are considered advertorials which Interflora were "famously" penalised[3] for and something Google specifically identifies within their Link Scheme examples[2]

[1] https://support.google.com/webmasters/answer/35769

[2] https://support.google.com/webmasters/answer/66356

[3] http://searchengineland.com/google-says-no-comment-on-why-in...

Wait.

I was about to ask if the lesson to draw here was to make it look organic.

But mostly serious question here: how does TechCrunch have any Google rank then? How do most tech publications, for that matter? PR hits are extremely common; they're the name of the game when it comes to cheap content.

It's okay if it looks like it's mutual self interest, and not if it overtly competes with Google's advertising platform?

how does TechCrunch have any Google rank then? How do most tech publications, for that matter?

TechCrunch does not post advertorials, they post content for free and they actually posted on their site to reaffirm this fact[1].

Sure there may be some PR firms etc who might get paid to get that content on to TechCrunch or another tech publications but TechCrunch writers are not directly compensated for doing so.

As a result the content they write from a startup launch to a new feature etc is considered "natural" by the search engines as they're choosing to write about it.

Moreover, those publications get the majority of their search engine traffic through being in Google News ala. posting about "Twitter" and getting inserted into the "Twitter" SERPs within the "News" section. Additionally they also leverage internal linking to boost their SERPs potential e.g. whenever they talk about Zulily (they're in #9 for me in Incognito mode although they might be higher/ on Page 2 for you) instead of linking to the site they'll reference the "Tag URL"[2] as well as referencing Crunchbase[3] (although CrunchBase isn't really 'internal' as its an external site). Likewise, Google loves "fresh content" so they will naturally be inserted with little/no links within the top 7ish results for something generic (although this does not always happen) such as "credit card numbers" when they do a post about "credit card numbers" and will naturally lose search engine positioning for that term over time.

[1] http://techcrunch.com/2012/11/08/we-are-worth-at-least-3k/

[2] http://techcrunch.com/tag/zulily/

[3] http://www.crunchbase.com/company/zulily

It's not different, and Google will penalise as needed (if the site is unrelated to the reviewed object, for instance, also the FTC can sue if there is no proper disclaimer)
Interesting, does that make sponsored posts necessarily against TOS?
Sponsored Posts are what Google calls advertorials and considered to be a link scheme[1] which means they're against their webmaster guidelines[2].

However, they're acceptable if they do not carry any link equity (aka. use the no follow tag) on the link.

[1] https://support.google.com/webmasters/answer/66356

[2] https://support.google.com/webmasters/answer/35769?hl=en

Interesting. I am in a small niche, where a lot of student society websites ending in .edu have links to their sponsors, and they don't nofollow them.

Typically they don't know what nofollow is, and their sites are run by people who don't really know much about websites.

I can't educate the entire sector. Is it better to steer clear of link building with this broad swath of sites? Most of the other sites in my niche have built links through these sites.

There is a shortage of followed links in my niche as most content creators in my narrow field are commercial and don't link to competitors.

Nah -- it's just like buying ad space on a site though, so the links to your site should be nofollowed.
"You seem to have a lot of experience with this kind of stuff."

Which is sad.

SEO is a cancer and, like cancer, there is no good SEO. Google and their broken search ecosystem have spawned a vast, useless, derivative "industry" that needs to die.

I say "broken ecosystem" because SEO should have a value that is inverse to how well google indexes. That SEO exists at all shows the extent to which it is broken. Website death penalties (or whatever we call this) don't solve the problem.

You are somewhat right and somewhat wrong.

SEO indeed is like what you described, but there is a fine line to start doing blackhat or stay whitehat.

Now the problem is, the line is so thin you end up crossing it at times. And everybody does. The way Google works you need to do both somewhat in a way to get the balance and stay top.

For example, link-building its one of the worst areas where defining white-hat and black-hat is really difficult. One way is doing guest post is allowed and great but paid posts aren't. There is no guarantee a guest post is not a paid post.

There are 'digital marketing agencies' which at times are so ridiculous that the backlink included in the guest post doesn't have any value in to the post as such but anchor text magic does all the work.

SEO isn't broken as such but there are too many backdoors to doing the white-hat SEO methods in the concealed black-hat way that it isn't realistic to blame he search ecosystem.

I agree, website penalties is not a good solution to the problem.

There certainly is good SEO. Good SEO is often indistinguishable from good accessibility, good marketing, good design, good conversion optimization, good information theory, good metrics.

You use sites because they used good SEO. You can find StackOverflow posts ranking at one, because they used the good SEO that the good content deserved.

Anyone with an internet business taking your comment at face value will find herself ranking not in the top 20, no matter how good their content.

Good SEO has worth, has value. Even Matt Cutts promotes good SEO and sees the worth. Without SEO analytics you are sailing blind. Without knowledge of how to interpret them. Without knowledge of the guidelines you get gaffs like these.

There is good SEO and there is vandalization. Rap Genius vandalized their rankings.

No industry needs to die. If it needs to, explain why.

Optimize your site for all users, not just search engines.

If you stay in this naive view of SEO you will never have a successful internet business in many niches. What is sad is that some developers and designers can not deliver a website that is SEO optimized, so a business owner has to pay double to get an SEO to fix all on-page accessibility issues.

This penalty is entirely the result of that blogpost and the PR it generated. I am willing to bet that Google already discovered many of these links, through spam reports, through manual spam fighters and through algorithmic detection. This PR, and the open letter, forced Google's hand. Of course they have to visibly act on it. Of course they are well aware that that niche has some scummy shady practices, and Rap Genius is not the only one. Of course they have a better internal tool than OSE and can check link profiles and detect networks in a manner of minutes.

Besides, you can be a great SEO and never do a single link-building campaign. There is tremendous value in simply finding new content and articles to rank for and drive traffic. I loath black hat SEO, not for the unfair temporary advantage it gives, but because it creates posts like yours. Because it burns companies that are now much more careful to hire good SEO's. Because I know that good SEO helps people access relevant content.

Like e-mail spam, search engine spam will die. SEO's will be retrained to create websites that are accessible, user friendly, give content that is relevant to the search query.

Just read: http://static.googleusercontent.com/media/www.google.com/en/...

And try to find a single practice in there that is either manipulative, entirely done to service search engines alone, or hampers accessibility and user friendliness.

Good SEO builds authority. Creates quality mark-up. Makes descriptive website titles. Implements breadcrumbs and a sane website hierarchy. Rids the index of duplicates. Drives more users to your sites. Makes sure that blind people can access your content, like images or video. Makes sites load faster. Makes content more trustworthy etc. Good SEO needs no manipulation. There is a lot to gain by simply aligning yourself with Google's vision.

Please reconsider your tone, or realize you do not add anything of value to the debate, or to the many online business owners on HN. Can you imagine raising funds for an online startup and then have your post be the slide for "online marketing"?

"Anyone with an internet business taking your comment at face value will find herself ranking not in the top 20, no matter how good their content."

... which is a direct indictment of google and their (as you put it) ineffective search results. If the best content is not being served based on its merit, the search results are bad.

A very good explanation for this is that optimizing for good search results and optimizing for high ad clicks from searchers are two different things, and you can't do both. Therefore the price of optimizing for search-ad-revenue is less than optimal results and the unintended consequence is a side-game that parasites play called "SEO". I mean parasite in the nicest possible sense and I think it's an apt description.

"Please reconsider your tone, or realize you do not add anything of value to the debate, or to the many online business owners on HN."

I'm not here for the business owners - I came for the hackers.

>which is a direct indictment of google and their (as you put it) ineffective search results

I think Google has the best quality index of all search engines. There is a Kaggle challenge for Yandex right now where you can optimize the search results taking into account data like user sessions and dwell time. It's hard, but very cool. Google owns hard and cool stuff. Web spam is a multi-faceted and complex problem. But these guys are training neural networks on Youtube stills and make it detect cats. The last time I found a top 10 spam result that irked me was months ago. I filed a report and move on to the otherwise great index.

>If the best content is not being served based on its merit, the search results are bad.

If the best content is in an image without an alt attribute or longdesc or HTML fallback, on a page with no surrounding text, no sourcing, no pagetitle, no meta description. On a domain that accidentally blocks search engines with robots.txt, has no structural interlinking of pages, no backlinks, takes 50 seconds to load, redirects crawlers to a different page by IP, and only works with javascript on. If that happens to be the best content, then that is a shame. Google could probably still index it :). But the search results are better for not ranking that inaccessible, untrustworthy, undiscoverable piece of content very high. A store can sell the best goods, but if they put blinds in front of the window, do not promote or advertise, make customers crawl over obstacles to place their order, then such a store will simply not do very well.

>optimizing for good search results and optimizing for high ad clicks from searchers are two different things, and you can't do both.

Why not? Major sites both optimize organic results and their SEM. AirBnb could create content for each major city they host in and rank organically. They could advertise locally or very targeted to people interested in making use of their services and link to the actual property page or listings.

>I mean parasite in the nicest possible sense and I think it's an apt description. I'm not here for the business owners - I came for the hackers.

Google compresses nearly every information in the world to a single search query. That search box is the shortest program to expand to any related content that is currently found in the world. A giant information retrieval intelligence sends out its bots to crawl information every second and store it in global time consistent databases. From a black box algorithm with supposedly 200+ ranking factors, an SEO has to give each web page the optimal chance to rank for what its worth, by reading public documentation, experiment, analyze, track, predict, format HTML documents in an information retrieval friendly way, do split testing with contextual bandits or multivariate A/B, improve accessibility, improve the link graphs of the internet and semantic web with metadata. You come for the hackers, yet you treat SEO's and Google like script kiddies.

"The only reason it isn't fair is that the site disappears from Google for the BRAND term e.g. [rap genius]."

How is that not fair? They were caught attempting to cheat google. Not only that, but cheat their competitors and us (the users). In most scenarios (criminal, sports, academic), being caught means it's game over. And that's exactly what happened here.

When you get caught stealing from a cash register or breaking into a house, they don't just make you put the stuff back and send you on your way. When you caught cheating in professional sports, you forfeit the entire game; in higher academia, kicked out of school.

If anything, I think google is too soft on people attempting to cheat them. When it's obvious and blatant, they need to lay down the law so hard that people won't even consider it next time. This will make the user experience better for everyone. A slap on the wrist tells people the risk is worth it and that means we will be served up potentially worse (less organic) search results.

  How is that not fair?
"My personal belief is that, devaluing the site for the BRAND term e.g. [rap genius] actually devalues Google's quality. On the other side of the coin, if someone searches for [X rap genius] whilst they are under penalty its fair that they do not rank for that either. However, there are obvious reasons as to why the search quality team have decided to do this."

  If anything, I think google is too soft on people attempting to cheat them. 
Google's Search Quality Team are actually pretty strict in terms of reviewing the reconsideration request and if the site has previously had a penalty they pay extra attention to the cleanup.
Criminal, sports, and academic concerns are all governed by bodies that impose those sanctions. When Google imposes a sanction they are the judge, jury, and executioner.

Since Google offers a public service and is owned by public shareholders, this poses somewhat of a problem....especially when you consider their marketshare and whether or not such sanctions offers them a competitive advantage.

> Criminal, sports, and academic concerns are all governed by bodies that impose those sanctions. When Google imposes a sanction they are the judge, jury, and executioner.

Actually, in all of those cases, the judge and executioner (and, usually, the legislature making the rules) are all employees of the same organization. In some cases there is a separate jury as a finder of fact (e.g., in criminal cases in the US and countries similar legal systems), though in sports and academic cases there may well not be, depending on the particular rules of the particular organization.

Since when does the NCAA act as anything other than the judge, jury, and executioner?
Too true. Another hilarious example is google banning BMW for using doorway pages - http://news.bbc.co.uk/2/hi/technology/4685750.stm

If I remember correctly the site was completely deindexed.

> The only reason it isn't fair is that the site disappears from Google for the BRAND term e.g. [rap genius].

So if instead of being rapgenius.com, they had shelled out big bucks for the domain lyrics.com, they would continue to rank for searches for "lyrics"? Our would someone at Google make a subjective decision about what terms are unique enough to their brand for them to not be penalized for?

Either of these seem much less fair to me than the status quo that all spammers get penalized for all search terms.

In that example they should still rank for "lyrics.com" which would be the BRAND term.

Google actually have done an EMD update to devalue sites trying to rank for generics using a generic domain.

As you can see using Google Adword Keyword Planner[1] for all locations in English the average searches are:

Lyrics - 1.2M avg. searches/month

Lyrics.com - 110k avg. searches/month

[1] https://adwords.google.com/ko/KeywordPlanner/

I think that it would make sense to allow direct queries for the trademark, e.g., "rapgenius" in the case of rapgenius. In your example "lyrics" would certainly not be trademarkable, so I would imagine "lyrics.com" or similar would be used instead.
>After identifying all the links which are causing the penalty, they should create a Gmail to outreach to all of the sites to remove the links.

Just slightly curious about this, because I receive these 'link removal' emails all the time for a site I run and have never acted upon them (our comments pages are indexed by google but not linked publicly after we switched to Facebook comments). Is there any reason a website owner should act and remove the links? Surely it's not my/our problem?

  Is there any reason a website owner should act and remove the links? Surely it's not my/our problem?
You should only remove links if they do not provide any value to your audience.

A website owner does not need to act and remove the links at all, you have the choice to not respond or refuse to do so. This is why Google allows people to disavow those links via the disavow tool[1] so they are no longer "counted" as providing the site with any link equity.

[1] https://www.google.com/webmasters/tools/disavow-links-main

> Whilst this may be surprising to some people on HN, this happens all of the time to sites who build links in an unnatural way.

What (if anything) happened between Google and Reddit, by the way? I remember Google results being stuffed with basically identical versions of the same Reddit comments pages (due to Reddit's language-localisation scheme: it.reddit.com , pl.reddit.com and so on), and then big chunks of Reddit (in all language versions) apparently becoming invisible in those Google results.

Reddit tried to resolve it through, a code level setting of the lang within their source code. However, Google ignores all code-level language information including lang attributes and document type definitions (DTD)[1] etc because, some CMSes set this automatically.

However, Google over the years have also worked on their international/localisation identification of sites. As a result they allow webmasters to geotarget them within WMTs[2] which seems to resolve the Reddit issue.

Having said that, there are other methods to fix the issue as well such as using the geotarget option within WMTs to using the rel alternate tag etc.

[1] http://googlewebmastercentral.blogspot.co.uk/2010/03/working...

[2] https://support.google.com/webmasters/answer/62399

> they should create a Gmail to outreach to all of the sites to remove the links.

Yes, that's exactly what spammy link creators should do: add insult to injury by ordering the sites they spammed to clean up their mess. Bonus points for adding a supposedly scary "or we'll have no choice but to ... disavow you" rider.

"My personal belief is that, devaluing the site for the BRAND term e.g. [rap genius] actually devalues Google's quality."

I absolutely second this because I just did a search for Rap Genius and it only showed twitter.. So I did a search for it on Bing instead.

Said it completely true. This how any brand which has got manually penalized need to work out.
Great information,

but I kind of want to punch the CEO of Rap Genius in the face, for trying to shame the competition in an apology that they are responsible for.

Eh..

For 200 pounds www.linkaudit.co.uk does it all for you