Hacker News new | ask | show | jobs
by jozefjarosciak 2080 days ago
I am running it through a certain set of filters. From my SEO days I recalled that new websites are often penalized based on the certain keywords in search engines. Considering this is a new site, and there is 300 million plus posts and I am not able to read and moderate it, this is the best way I know of to deal with it. But perhaps you're right and I should get rid of it. I'll think about it. This is a valid comment.
6 comments

Please do not filter anything. A project of this scope is greater than any SEO issues you may have.
Since you seem intent on being a reference usenet archive I think it's important to preserve the integrity of the original material. Moderating posts 20 or 30 years after the fact seems ill advised. If you modify the content in any way, at least put a prominent notice so that people don't get confused by the website name.

Also, it seems that your parsing process strips headers and that you don't keep the raw messages, however I remember that on some newsgroups people used to pass secret messages in headers that only those "in the know" would look for, it would be a shame to lose that. Access to posts in raw format would be nice in this scenario.

Maybe rot13 the words you think you need to censor? That'd be in keeping with the usenet tradition at least from the mid-late 90s when I was reading/posting heavily. And maybe add a simple javascript ROT13 widget so people can easily reveal it? (There was a time in my life when I could read ROT13-ed things pretty accurately in my head.)
Double rot13 just to be sure.
It's 2020 and we're under threat from state-level hackers. We need quadruple rot13!
:-)
I've decided to remove bad word filtering and all other censoring. Let's see how it goes.
Thank you :)
Epic!
One option is to censor by default for SEO, but have some checkbox that sets a cookie that uncensors it.
This would be a really cool sulotion, kind of the reversal of typical seo bombing techniques that hide spam pages on compromised sites.
You should definitely get rid of whatever is being used currently. The first group I randomly clicked (alt.alien.visitors) was censoring the word "public" (and "sucks" and "pipe"), multiple times in the same post which, if it happens a lot, especially on innocuous words, is really going to spoil what is an excellent project.

Its not a bad idea to filter content though, and/or have a flag button on threads/posts. 300 million articles from 40 years of an obscure and anarchic corner of the internet are bound to contain posts that are either potentially illegal or which you otherwise don't necessarily want to be publishing.

I've removed the filtering.
You can also remove the filter for users, but use site maps to make those posts not visible to search engines.
Thanks. Much appreciated. This way we get to experience the colorful humans in full.
Are you planning to monetize? If not, then keep SEO out of it.