Hacker News new | ask | show | jobs
by mmalek06 2310 days ago
Didn’t know about it. Is it possible to use only that subset of apache spark lib?
3 comments

I imagine you could reuse Catalyst to generate queries against JSON derived Datasets with some work. Starting point for some reading maybe? https://github.com/apache/spark/blob/e2d3983de78f5c80fac066b...
If you need to parse semi-structured documents using Spark, Quenya-DSL is much better than SQL, specially if you want to flatten the data.

https://github.com/music-of-the-ainur/quenya-dsl

Yes, you can run apache spark as a single node really easily. Then once its running you can fire up the Scala or Python shell that it comes with. After that, it is just a matter of issuing the statements to setup the data set then issues queries against it.
Not many people know this but Databricks offers https://community.cloud.databricks.com/ for free which allows you to run simple spark notebooks.

Disclosure: works for Databricks but not on spark

Still, it seems like a lot of work. My case was that I just wanted to feed JSON document into some mechanism that will give me back what I want. I wouldn't like to embed whole Spark framework into my app. I will read about it though :)