Hacker News new | ask | show | jobs
by xpose2000 5722 days ago
Maybe this can help . . .

I have a few websites that automatically make new posts. As of 10/14, they all show 0 pages indexed in Google. Previously they would get a few thousand visitors per day.

I guess Google feels as though they violate their terms and removed them. It seems to me it was a manual removal.

I received no emails in webmaster tools about the removal.

1 comments

"I have a few websites that automatically make new posts."

Making a bunch of autogenerated sites has its risks. For example, if you were just taking a bunch of MP3 names or Hot Trends queries and then scraping twitter for mentions of those phrases and slapping that all up on a website with scripts, that tends to cruft up our index with autogenerated content that users complain about and that violates our quality guidelines. Likewise, if all you were doing was scraping Twitter for phrases like sad or heartbroken or heartless and throwing that scraped Twitter content up on a webpage with a script, users would also complain about that autogenerated content and it would violate our guidelines. Would that be helpful insight?

As a concrete case to discuss, what about something like http://poeet.com/

I made this over a weekend. And the people whose poetry is being captured love it. But it is auto-generated in the sense you're talking about.

It actually went down for a bit and I got a bunch of complaints, enough that I got it back up relatively quickly.

Theres a clear difference between your site http://poeet.com and a clean cut case of auto generated spam. Your site is actually quite creative where it is aggregating content from a twitter hashtag and indexing short poems that may otherwise go unnoticed, you are also showing the users original tweet and @user and not manipulating anything. The original poster was likely scraping content, not providing citation and for the means of having the duplicate content wrapped around ads.
Matt - I agree autogenerated content is a problem and is polluting the search results so I'm glad you guys have taken action. But what about sites like bibleknowledgebookstore.com and articlesubmissionreview.com that are buying links, creating fake content, and spamming web 2.0 profile pages and forums? How come tactics like these are not only working, but dominating competitive markets? What's the point in going after high quality editorial links when sites are rewarded for essentially spamming?
Don't forget Google needs content publishers for Adsense. Surely that is the only reason brain liquifying content mills like ehow don't get the slap down? This junk is ridiculous (and this was one of the first pages I looked at):

http://www.ehow.com/blended-families/

How to Plan a Happy Blended Family How to Harmony in Your New Blended Family How to have harmony in your new Blended Family How to Achieve Harmony in a Blended Family How to Nurture A Blended Family How to Successfully Manage a Blended Family

WTF is this junk? Why does ehow.com get 3 million Google visitors a day? The mind boggles!

Great question, though ehow is created by user submission and paid article writers not an individual scraping other users content, publishing it, and not linking back IE $100 plagiarism. I do agree though that eHow is PURE junk and nothing but a site to generate ad revenue. I am not sure if they offer users who submit articles any profit sharing but they are being jipped as well. eHow is by the people behind Enom and a few other networks who give Google a ton of dough for advertising.
Matt, you completely rock. Those are fine examples. :)
xpose2000, happy to help without getting uncomfortably specific. ;)