|
|
|
|
|
by gauravm
4082 days ago
|
|
Sorry, no offense intended, if anyone took it. In my use-case, the words such as 'gay' and 'lesbian' were in almost all cases, used for explicit documents. This is a very naive implementation to quickly get a handle of amount of porny documents. I intend to do some more work around clustering of porny words. I think understanding sentiment would be hard and involves a lot of labeled data, but that is a potentially very useful project. |
|
Although I didn't realise this was meant to filter out a pornographic vocabulary; it makes more sense now.