Hacker News new | ask | show | jobs
by mirimir 2673 days ago
Indeed. Many years ago, I "ran SQL" on a couple decades of Usenet newsgroup data. Extraction and manipulation involved a bunch of grep, sed, tr and awk (and millions of tmp files). But, as with PDFs of utility bills, it was very specific regex.
1 comments

Hey, Kshitij from Rockset here.

With Rockset you can avoid ETL when it comes to extracting and manipulating the data. Also, the main value here is that you can join this data with other data sets that are in JSON, CSV, XLS or Parquet formats using SQL to help in analysis.

Maybe you could add modules for extracting and manipulating data from popular sources. Such as the most popular social media. Also Amazon, Craigslist, Ebay, etc. And the main search engines.

There are many people who want usable data from such sources. And your service wouldn't be doing any scraping, so you'd probably be OK legally. But IANAL, so do check.