Hacker News new | ask | show | jobs
by jhchabran 1626 days ago
That's an ambitious goal, I'm not sure to see how that would be maintainable on the long run.

On a much smaller scale, if anyone is interested, I maintain a black list focused on those code snippet content farms that gets in the way when you're searching for some error message or particular function here https://github.com/jhchabran/code-search-blacklist.

3 comments

May I know why my domain (cyberciti.biz) was added to that list? I created my site back in 2000, and there was no StackOverflow or anything. So much for creating original content and then getting labelled as a spammer. In fact, some of the top answers on StackOverflow were copied from my work without giving any credit to me. Some people do give credit tho. But, go ahead block a site that actual humans maintain over 20+ years. Also check my About[1] and Twitter[2] page. There is no scrapping or spamming going on my part.

[1]https://www.cyberciti.biz/tips/about-us [2]https://twitter.com/nixcraft

Interesting, I have your site on my mental blocklist as one of those scrape and rehost sites.

I'll be honest, I don't remember how I came to that conclusion but I suspect I encountered an unsatisfactory answer to a question I was looking to answer, saw the .biz and drew my conclusions.

The noise to signal ratio for most of my queries is so high that I have to start judging a book by its title, not even its cover.

I've noticed cyberciti.biz showing up in my DDG search results but I've always ignored it because of the initial captcha. I will try it now that I've seen your post here!

The .biz definitely does not help, since it hints to me that it's just another one of those worthless reposting sites, as someone else commented below.

Not OP but dot biz is associated with spam in my head for what it’s worth
A couple of years ago, I was at Google's office, and I talked with someone who works on search about .biz extension. They said domain extension doesn't matter. At that time, they said backlinks is one of the most vital signals apart from some PR. That was like eight years ago. So I never changed the domain name despite owning the .com version too. It will break too many backlinks.
Sure, google might be fine with a .biz. As a human consuming googles responses, my eyes typically glaze over seeing .biz and jump to the next search result. It's not that there is anything particularly wrong with .biz, but this is the first legitimately useful site (to me, probably plenty for others) i've heard of using .biz.
cyberciti.biz is one of the few sites that come up in Google search results for anything code/linux related that has valuable content. I do wonder why someone would block it.
I agree. Valuable site and nixcraft adds a ton of value to the linux community. Thank you nixcraft!
Thank you for your support!
I wanted to stop by and say thanks for cyberciti.biz! I've been using it since 2001-2002 when I got my first Verio Freebsd VPS and had to figure out what was going on.

When I see your site pop up in my search results I know the content is going to be more reliable than most of the others. Thanks for the effort you've put into it.

At first scan your site looks like one of those automated scrape and republish sites. I'm curious what got you on that blacklist (misspelling? bad first impression? automated tool gone awry?) though.

Glad you said something though, I wouldn't have looked at it twice without a human attestation.

I kept it simple on purpose. As a result, it loads faster on both desktop and mobile and passes web.dev PageSpeed insight test too.
For me personally, the titles are what makes it suspicious for me. I've almost never, NEVER found something good in an article titled "(top) xy ways to z", I've come to immediately avoid any article with such a title.
Yep, and it doesn't break if you have javascript disabled. Good work.

That doesn't really intersect with my observation in any way though. As a stranger I don't see your intent when I see your website, I just see your website.

I'm always happy to see your site in search results, it's one I recognise and trust for CentOS/Linux related information for years. Thank you!
Thanks for the kind words!
Your comment is exactly why spam prevention is difficult. Sorry for that.
Yes, imagine if this blocklist becomes mainstream and used by major other adblockers or extensions? Then, there is no central place where one man's project can ask to remove my domain, and it will vanish. I often read on HN how much the web is centralized, and then we come across resources that kill independent blog/sites because of an error on the list maintainers part.
As a user I think I've put your website on my mental "avoid it" list for its design. I've opened a page now and I feel like I'm instantly in a tunnel vision mode. For UX: it's not a pleasure to scroll up & down; maybe there's also a psychological element about the main content area being so slim in width.

The other comment made me remember there was captcha too, right? I had been using my own rented server as a VPN for all my internet access. But I'd have never blocked it for a public list - I've read the 'about me' page.

> some of the top answers on StackOverflow were copied from my work without giving any credit to me

That's really frustrating. I'm building a faster search engine for programming queries and just added your site cyberciti.biz as a recommended and curated source of Unix/Linux material. Hope more devs get aware of your work and you (and your collaborators) receive the credits deserved. Thanks for your work of many years.

Thank you. Do ping me when your work is ready. I will share it on Twitter :)
What CDN do you use? I was immediately asked to solve a captcha from my phone.
Cloudflare sometimes triggers those when they think IP reputation is not good. Typically happens for data centre IP ranges as WAF has an anti-bot feature. So I know it is a problem for some.
Maybe it's the fact that I don't use one of the three major US ISPs. Hopefully CDNs get used to the idea that there can be more than one fiber provider.
Would you mind sharing the Cloudflare ray id displayed at the bottom of the screen when you see a captcha? I can look into it, and maybe be I will able to fix it too. Reply here or email me at webmaster@cyberciti.biz. HTH.
Not sure if you changed your Cloudflare settings, or if Cloudflare changed something, but I'm no longer getting the captcha, so that's good, but sadly I can't help debug the original issue.
It's worth a try! Also, thanks for maintaining those lists!
Well there's only one way to find out