Hacker News new | ask | show | jobs
by throwaway427 2684 days ago
I will often scope google searches for product reviews or food/recipe/diet/fitness knowledge to reddit. This is not the sole research I do but it does tend to be pretty high quality data.

I would not be surprised if Reddit is an ML/AI dataset gold mine.

2 comments

Reddit's public-facing data is very useful for ML/AI.

On the data science end, you can do a lot to gauge important topics and user behavior: https://minimaxir.com/2018/09/modeling-link-aggregators/

On the silly AI-end, I made a subreddit consisting only of text-generating RNNs: https://www.reddit.com/r/SubredditNN/

Reddit's internal data is even more robust.

>Reddit's public-facing data is very useful for ML/AI.

assuming that the users generating the data are humans and not bots....

I take the same approach. Compared to something like amazon reviews the information tends to be more reliable.

I'm sure there are plenty of paid shills but my sense is that most people who post on reddit aren't trying to sell you something.