As people always do in such threads, I’d too like to chime in with alternatives I’ve been a co-author of, as this is an area I’m very interested in.
The first one, which is definitely close in spirit to this project is OctoSQL[1]. We too are trying to query all the things with SQL, including JSON, though we extended the idea to supporting more data sources and file formats, with the ability to mix and match in joins and subqueries. Curious what direction this project will take though, as we’re now heading for streaming sources (like Kafka) in pure SQL!
A totally different one is jql[2] where I’ve been exploring more of a lispy continuation passing style based approach to a JSON query language, as I didn’t really feel like the current ones are ergonomic.
If you like this, make sure to check them out too! (And all the others people will be posting, as I think they all are fascinating! This is a big area with lots of space for innovation left.)
No, though people have contributed R and Python bindings already. So one may hope :) though Go is fairly unfriendly to binding from other languages (because of the runtime overhead).
We ourselves mainly work on the command line interface and concentrate on it.
I've been experimenting with the combo of the "jq" JSON query language and SQL via a custom SQLite function. I bundled the result as a plugin for my Datasette project: https://github.com/simonw/datasette-jq
Apache Drill [1] is alternative, using ANSI SQL. Beside json, it support JDBC, csv, parquet, file system traversal, HBase, Mongo, S3... I believe it also support joins across different sources, although I haven't tried that.
EDIT: to add context to the story - Apache Drill is running on JVM and can be embedded, so it can be run from Scala code as well.
I recently published a github project with the purpose mentioned in the post's title. Obviously if you just want to parse JSON, there are much better Scala libraries, like Circe. However in my case, I had to filter responses from a very dynamic api that I couldn't change in any way (for example to give me prefiltered results). My first choice was to use JsonXPath Java lib, but I didn't like that XPath-like notation, so I figured that some people may benefit from using what most of us already know - SQL - to create them queries. In my project, it was supposed to be configurable - the architect wanted to enable the not so technical business bunch to query json documents almost without programmers help and the easiest way to do it was to put their queries into the database and them allow them to match a query against an api. I left the project some time ago, but the idea I had back them kept me awake at night and here it is. Hopefully not entirely very bad ;) If you like it, I could really appreciate some help with development of the rest of some even cooler functionalities.
After having used jq for some heavy data processing (building an entire Markdown document from a huge slab of JSON with no external tools), I am inclined to agree.
However, I think this may be indicative of the task you are trying to complete rather than the tool itself. At a certain level of complexity your resulting code will just look bad regardless of which language you use. One thing jq does to make this easier is providing the pipe operator, so you can break a complex task into stages: A | B | (C .... | C.5) | D | E...
I figured, I'll give an update to anyone who's still interested. I managed to fix a few issues and and some major functionalities (eg: * symbol to select all) over the last days. The next thing I'll work on will be type queries, so that the user will be able to use queries as strings but also as a chain of method calls with a type parameter representing the model returned from the query. Something like: from(jsonObject).where(list of filter functions).select[CaseClassType]() . Stay tuned!
SQL isn't the good part about relational databases, its the relational model that is the good part. The ability to query your data from multiple different angles.
In saying that, I guess its still better than having yet another custom query language a la mongo.
I don't agree :). SQL nicely abstracts what would have been more difficult or technical to do otherwise. Yep, it was create in order to enable non strictly technical people to query their data, but does it make it bad? I don't think so. Also authors of great many NoSql databases (and other technologies, ever heard about the widely loved LINQ technology in C#?) think, giving their users the possibility to query data using what they already know. Also: can you honestly, from the bottom of your heart, say that you'd rather filter json documents using some form of XPath? Besides: I'm not trying to criticize relational model, as well as I'm not saying that JSON is the way to go when storing data :D I just wanted to create a tool for people that are trying to implement a very specific business case I described and, due to lack of tools, are having some issues with it.
This stringified approach loses type safety and IDE assistance for what benefit? Why do we want it to look like SQL? A Slick/Quill-like approach seems much safer.
Please read my description (found somewhere in this discussion). That being said: there are scenarios when being able to save a query into a database or config file, then load and run it is useful. In general I agree though and that’s why one of the enhabcements I want to implement is to create a method based syntax with projecting selected fields to a case class.
As for the „why do we want it to look like sql” - I like the syntax and I think it’s much more readable than any other language used to process data. But that’s just a matter of taste :)
Yes, you can run apache spark as a single node really easily. Then once its running you can fire up the Scala or Python shell that it comes with. After that, it is just a matter of issuing the statements to setup the data set then issues queries against it.
Still, it seems like a lot of work. My case was that I just wanted to feed JSON document into some mechanism that will give me back what I want. I wouldn't like to embed whole Spark framework into my app. I will read about it though :)
The first one, which is definitely close in spirit to this project is OctoSQL[1]. We too are trying to query all the things with SQL, including JSON, though we extended the idea to supporting more data sources and file formats, with the ability to mix and match in joins and subqueries. Curious what direction this project will take though, as we’re now heading for streaming sources (like Kafka) in pure SQL!
A totally different one is jql[2] where I’ve been exploring more of a lispy continuation passing style based approach to a JSON query language, as I didn’t really feel like the current ones are ergonomic.
If you like this, make sure to check them out too! (And all the others people will be posting, as I think they all are fascinating! This is a big area with lots of space for innovation left.)
[1]: https://github.com/cube2222/octosql
[2]: https://github.com/cube2222/jql