Hacker News new | ask | show | jobs
by gauravm 4082 days ago
Sorry, no offense intended, if anyone took it. In my use-case, the words such as 'gay' and 'lesbian' were in almost all cases, used for explicit documents.

This is a very naive implementation to quickly get a handle of amount of porny documents. I intend to do some more work around clustering of porny words. I think understanding sentiment would be hard and involves a lot of labeled data, but that is a potentially very useful project.

1 comments

It's okay! I wasn't offended. :-)

Although I didn't realise this was meant to filter out a pornographic vocabulary; it makes more sense now.