Hacker News new | ask | show | jobs
by chiachun 4019 days ago
The release notes: https://spark.apache.org/releases/spark-release-1-4-0.html

Another major change is that it supports Python 3 now. https://issues.apache.org/jira/browse/SPARK-4897

1 comments

They've integrated Tungsten / native sorting into shuffle and observed some decent speedups:

* https://issues.apache.org/jira/browse/SPARK-7081

* https://github.com/apache/spark/pull/5868#issuecomment-10183...

However, I guess reduceByKey (and friends) don't benefit yet.

Their SGD implementation still uses TreeAggregate ( https://github.com/apache/spark/blob/e3e9c70384028cc0c322cce... ) so I wonder when they're planning to add some of the "Parameter Server" stuff (e.g. perhaps butterfly mixing or Kylix http://www.cs.berkeley.edu/~jfc/papers/14/Kylix.pdf )