Hacker News new | ask | show | jobs
by mastratton3 3682 days ago
So I did find it useful for doing additional exploratory aggregations once the data was already cleaned and denormalized. My comment was more directed at the upfront initial data processing (In our case, extracting time series data out of a large amount of files).

I did hit issues w/ multiple joins and shuffling though. Have you not hit issues w/ shuffling?

I was using Spark 1.5.1 for the record.

1 comments

Have you tried tuning Spark's memory parameters?