| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by eatonphil 1521 days ago

Fair point! I don't mean to say jq performance can't or shouldn't be improved.

Just that jq does two things: 1) ingest and 2) query.

If you're doing a bunch of exploration on a single dataset in one period of time or if the dataset is large enough and you're selecting subsets of it, you can ingest the data into a database (and optionally toggle indexes).

Then you can query as many times as you want and not worry about ingest again until your data changes.

All three of the tools I listed have variations of this sort of caching of data built in. For dsq and q with caching turned on, repeat queries against files with the same hashsum only do queries against data already in SQLite, no ingestion.

1 comments

jeffbee 1521 days ago

I have a large GeoJSON dataset I analyze to answer local government questions. It is of course loaded into a database for common questions but I also find myself doing ad hoc queries that aren’t suited to the database structure, and that’s where I find myself waiting for jq. Also I use jq as the ETL for that database.

link