Hacker News new | ask | show | jobs
by mumblemumble 2147 days ago
FWIW, while the JVM isn't completely irrelevant in data, I will say, even as a big user of Spark via Scala, that JVM languages are quickly becoming irrelevant in data. Spark's Scala API is simultaneously the core of the platform, and also very much a second-class citizen that lacks a lot of important features that the Python API has. Easy interop with a good math library, for example.

Similarly, the reference implementation of Parquet may be in Java, but consuming it from a Java language, outside of a Spark cluster, is still a royal pain. Whereas doing it from Python isn't too bad.

Long story short, I think that expecting a project that's just trying to implement a columnar memory format to also muck out the world's filthiest elephant pen is perhaps asking too much. Though perhaps a project like Arrow could serve as the cornerstone of an effort to douse it all with kerosene and make a fresh start.

1 comments

I spent a couple of years doing consultancy for life sciences research labs, most people were just using Excel and Tableau, plugged into OLAP, SQL servers, alongside Java and .NET based stores.

Stuff like Arrow doesn't come even into the radar of IT.