|
|
|
|
|
by vvladymyrov
2620 days ago
|
|
I've seen announcement about .Net interior support in Apache Spark some time ago.
The benchmarks are interesting and tell the story - in few cases it is faster than Python, but slower than native (for Spark) Scala/JVM. Maybe with Arrow interchange Python's performance would increase (and for other interpose that would use Array - i.e. for .Net). But performance is not the only thing - there is also ability to debug issues. For this you still need to dig into Apache core which is in Scala. This implementation in .Net would be "gateway drug" for moving your production to Scala/JVM. It happened to me with PySpark - majority or tasks at hand can be solved with PySpark. But digging into the issues and stack traces brought me to Scala internals of Apache Spark.
As a result in cases when python specific libraries are not needed and high performance needed I would write Spark programs Scala from the beginning. |
|