|
It varies for different cases depending on a lot of factors like severity, impact on users, etc. In the particular case from above, to find out the history of what might have happened, I just picked a domain at random and dug into its history to find the autogenerated pages with tons of typos for each domain. I kinda thought one example would make the point. Does it help that much more to give another example? I can look more up. For http://www.bigbadblogdirectory.com/ it looks like you were autogenerating typos not just for websites, but for popular blogs. So http://www.bigbadblogdirectory.com/jeffmatthewsisnotmakingth... looks like it had (I had to cut out the vast majority of the typos because the comment was too long for HN.) jeffmatthewsisnotmakingthisup.blogspoot.com, jeffmatthewsisnotmakingthisup.bloyspot.com, jegfmatthewsisnotmakingthisup.blogspot.com, jeffmatthewsisnomakingthisup.blogspot.com, jeffmatthwesisnotmakingthisup.blogspot.com, jeffmatthewsisnotmakingthisup.nlogspot.com, jeffmatthewsisnotmakingthisup.blogspot.ccom, jeffmatthewsisnotmakingthisup.bligspot.com, jeffmatthewsisnotakingthisup.blogspot.com, jeffmatthewsisnotmakinghtisup.blogspot.com, jeffmatthewsisnotmacingthisup.blogspot.com, jdffmatthewsisnotmakingthisup.blogspot.com, jeffmatthewsisnot akingthisup.blogspot.com, ieffmatthewsisnotmakingthisup.blogspot.com, jeffmatthewsisnotmakingthisup/blogspot.com, jeffmatthewsisnotmajingthisup.blogspot.com, jeffmatthewsisnotmakingthishp.blogspot.com, jeff atthewsisnotmakingthisup.blogspot.com, jeffmatthewsisnotmakingthisup.blogspot/com, jeffmatthewwisnotmakingthisup.blogspot.com." I could post more examples from the other domains, but my point is that this is the sort of thing that users dislike and complain about. If you were a blogger and saw pages like this ranking for your name or your site's name, you probably wouldn't be happy either. From looking at a few domains, I don't think that we overgeneralized from a few pages in this case. I know that you've moved on and the domains are shut down now. And I'm not trying to be cantankerous. I'm just trying to say that from our point of view there's good reasons to take action on sites like this so that users don't complain to us. |
Each site took a long time to make actually. They either involved generating a data set from scratch or piecing together and parsing other large data sets. This one in particular, I was crawling the Web for feed discovery and was planning on adding stuff like grouping the best posts by category, etc.
Yeah, would love to know about some others, e.g. japanese2englishdictionary.com, idnscan.com, serverslist.com. Also, did you actually get any complaints about this or was it triggered by some other threshold/thing? On a side note, I still get requests about exposing some of this data, i.e. sites behind ip addresses or lists of domains matching some criteria. In any case, thx for the info!
I can understand the need to take action. I just think it could have been handled better. If typos were the problem, I would have removed them immediately if someone told me, and that could have been automated. In retrospect, it seems pretty obvious, but it wasn't at the time.