Hacker News new | ask | show | jobs
by nopuremore 4082 days ago
With the little effort of google translate your dirty words to Spanish (copy paste all words), you obtain a filter for Spanish, add synonyms for stronger filtering.

Perhaps gay is not a dirty word? (is included in your dirty words, but gay people should think otherwise.

1 comments

I'm gay, but I don't consider it offensive that the word is in there.

A lot of people use the term "gay" in conversation as a synonym for "that sucks"; a friend of mine does it all the time. I don't think they mean anything by it.

To differentiate between "I am gay," and "Oh that's gay. I'm sorry that happened," you'd need a NLP with a politeness preference.

Sorry, no offense intended, if anyone took it. In my use-case, the words such as 'gay' and 'lesbian' were in almost all cases, used for explicit documents.

This is a very naive implementation to quickly get a handle of amount of porny documents. I intend to do some more work around clustering of porny words. I think understanding sentiment would be hard and involves a lot of labeled data, but that is a potentially very useful project.

It's okay! I wasn't offended. :-)

Although I didn't realise this was meant to filter out a pornographic vocabulary; it makes more sense now.