Hacker News new | ask | show | jobs
by _delirium 4163 days ago
If it really is just a one-shot with one simple-ish filter, I agree. But I often find myself incrementally building shell-pipeline tangles that are sped up massively by being replaced with SQLite. Once your processing pipeline is making liberal use of the sort/grep/cut/tee/uniq/tac/awk/join/paste suite of tools, things get slow. The tangle of Unix tools effectively does repeated full-table scans without the benefit of indexes, and is especially bad if you have to re-sort the data at different stages of the pipeline, e.g. on different columns, or need to split and then re-join columns in different stages of the pipeline. In that kind of scenario a database (at least SQLite, haven't tried a more "heavyweight" database) ends up being a win even for stream-processing tasks. You pay for a load/index step up front, but you more than get it back if the pipeline is nontrivial.