Hacker News new | ask | show | jobs
by spenrose 2466 days ago
"the first implementation was in Scala, which was chosen because of its algebraic data types and powerful pattern matching. This made writing the optimizer, which is the core part of the compiler, very easy. Our original optimizer was based on the design of Catalyst, which is Spark SQL’s extensible optimizer. We moved away from Scala because it was too difficult to embed a JVM-based language into other runtimes and languages."

There is an important "contemporary history of computing" article to write about the evolution of the Spark project from "let's build a distributed filesystem for MapReduce in Java because we read those early Google papers" to "SQL is the right model for working with data so DataFrames" to "meet data scientists where they are: Python (and R)" to "make machine learning easy" and now to "LLVM, but for crunching big numeric arrays".

3 comments

Not that you should replace Rust with Haskell, but Haskell would've been a better choice than Scala.

It has its own runtime, but it's not difficult to call Haskell code from C or ATS or whatever.

Not difficult to call from C? How does that work, exactly? Wouldn't you need to properly setup the whole runtime (incl. GC) first?
Yes but I think the parent meant its 'just' an #include and ghc_init() away.
Very interesting. Do you have any references to share relevant to the article you suggest to be written?
No, I based on attending SparkConf between 2015 and 2017. You could probably assemble half of it just by reading summaries of Matei's keynotes.
Ah, thanks for the suggestion.
> it was too difficult to embed a JVM-based language into other runtimes and languages

In addition to the JVM, Scala has had JS [1] and native (via LLVM) [2] targets for years.

(And that's not even mentioning any second-order compilations; e.g. Scala -> JVM bytecode -> native)

There's a number of reasons to not choose Scala, but portability is far from one of them.

[1] https://www.scala-js.org

[2] http://www.scala-native.org

The library ecosystem is different between JVM/JS/Native. Porting across runtimes may require more work than just changing the compilation target.
That's true. If you want it across all platforms, you are restricted to the Scala ecosystem, and can't use Java and JS ecosystems.