Hacker News new | ask | show | jobs
by minimaxir 3298 days ago
Hmm, the BigQuery HN dataset is now updated daily and contains comments as well as stories? That's new, and I'll certainly give it another look at for my projects.

With the bigrquery R package (https://github.com/rstats-db/bigrquery), you can access the HN dataset directly from R, using dplyr syntax too. (for simple queries atleast; you can pass the raw SQL for complex queries)

As noted, the resulting dataset of words is large, so mapping the words in BigQuery itself may be more practical (using a combo of SPLIT and UNNEST with standard SQL), although of course you can't do complex operations like logistic regression or splines there.