Hacker News new | ask | show | jobs
by qxf2 3540 days ago
The Reddit data set on BigQuery is excellent. My side project is tangentially related to the fact that the Reddit data set has normal folk commenting. I have been using Reddit comments to help writers research and find what normal people say about any topic [1]. So far, I have had little luck in incorporating the comment scores and coming up with something more useful than the standard bag of words search techniques[2]. I am currently working on making a more interesting/creative writing prompts ... again based on the Reddit data set.

One problem for data geeks to solve: Reddit data fits nicely into a graph structure and not so nicely in table form. It would be fantastic if someone put the Reddit data set into a graphdb and made it open.

[1]https://wisdomofreddit.com and https://github.com/qxf2/wisdomofreddit

[2]For now, my search engine currently just uses Whoosh's (out of the box) BM25F.