|
|
|
|
|
by mumblemumble
2147 days ago
|
|
FWIW, while the JVM isn't completely irrelevant in data, I will say, even as a big user of Spark via Scala, that JVM languages are quickly becoming irrelevant in data. Spark's Scala API is simultaneously the core of the platform, and also very much a second-class citizen that lacks a lot of important features that the Python API has. Easy interop with a good math library, for example. Similarly, the reference implementation of Parquet may be in Java, but consuming it from a Java language, outside of a Spark cluster, is still a royal pain. Whereas doing it from Python isn't too bad. Long story short, I think that expecting a project that's just trying to implement a columnar memory format to also muck out the world's filthiest elephant pen is perhaps asking too much. Though perhaps a project like Arrow could serve as the cornerstone of an effort to douse it all with kerosene and make a fresh start. |
|
Stuff like Arrow doesn't come even into the radar of IT.