| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by mastratton3 3682 days ago

So I did find it useful for doing additional exploratory aggregations once the data was already cleaned and denormalized. My comment was more directed at the upfront initial data processing (In our case, extracting time series data out of a large amount of files).

I did hit issues w/ multiple joins and shuffling though. Have you not hit issues w/ shuffling?

I was using Spark 1.5.1 for the record.

1 comments

tma-1 3682 days ago

Have you tried tuning Spark's memory parameters?

link