|
|
|
|
|
by nonsince
3353 days ago
|
|
Decent intro to natural language processing, but scientifically rubbish. I think using "cannabis" and "marajuana" skews heavily towards advocacy and serious discussion, since general chat about weed will use words such as, well, weed. Or pot, or hash, or herbs, or the maple leaf emoji, or a link to /r/trees. The problem with natural language processing on drugs is that the names people use for drugs are specifically chosen to be easily-confused with another, innocuous usage. That's the entire point of street names - to hide the fact that you're talking about drugs. I think you would need some kind of AI leagues ahead of our technology to accurately analyse people colloquially chatting about any drug, let alone one as popular and ubiquitous as weed. |
|
> For quality control, I looked only at comments with Reddit score > 100
That's a non-trivial popularity score. Also, since it's an absolute score, it will bias against smaller subreddits, where 100 points on any comment is a difficult task.
This is much less "how people talk on reddit", and much more "the type of comment that gets upvotes on the default subreddits"