Hacker News new | ask | show | jobs
by jssmith 2993 days ago
As a command-line tool it could serve as a more powerful replacement for unix utilities that I often use for simple analysis jobs: sort, uniq, wc, awk, sometimes cut and sed. I'm going to give it a try.

It looks like textql loads everything into an in-memory SQLite instance, whereas I'd really like to see an approach that uses the SQLite's virtual table mechanism (https://sqlite.org/vtab.html), which would avoid the loading step and perhaps make streaming processing possible.

1 comments

What you describe would be a super nice utility. Great for prototyping and development, as the code could be copy & pasted from such a tool into application code.

The streaming would make it memory efficient, and possibly able to handle some big data - maybe not true "Big Data", but certainly 10s of gigabytes.

Anyone want to take this idea into a GoFundMe site?

SQL is set-oriented. How would that work on a potentially indefinite stream, other than as a simple filter which you could just do with a tool such as awk.
Many relational operations don't require a whole stream to compute, and many of those that do don't need it all at once.

Projection (mapping), a join against a fully loaded other side as well as filtering work.

Aggregation can consume an indefinite stream with limited working set if the cardinality of the grouping key isn't large.

And of course you can combine these in nested and unioned operations, computing across multiple indefinite streams concurrently and with limited working set.

It would be tricky to make work effecively without hinting for things like joins, for sure; join order is one of the hardest bits a query engine optimizes.

I think this might be helpful context:

https://calcite.apache.org/docs/stream.html

Sort of a good idea to play with a small database to see how things work if it was on a SQL database.

I usually have to develop a database in Windows and Access. One more tool to work in Linux is a good idea.

I used to use awk and sed before.