Hacker News new | ask | show | jobs
by choppaface 4021 days ago
They've integrated Tungsten / native sorting into shuffle and observed some decent speedups:

* https://issues.apache.org/jira/browse/SPARK-7081

* https://github.com/apache/spark/pull/5868#issuecomment-10183...

However, I guess reduceByKey (and friends) don't benefit yet.

Their SGD implementation still uses TreeAggregate ( https://github.com/apache/spark/blob/e3e9c70384028cc0c322cce... ) so I wonder when they're planning to add some of the "Parameter Server" stuff (e.g. perhaps butterfly mixing or Kylix http://www.cs.berkeley.edu/~jfc/papers/14/Kylix.pdf )