Hacker News new | ask | show | jobs
by vlahmot 3177 days ago
We do the EMR backed by s3 setup, only with snappy over gz as gz can't be split.
1 comments

Ah, word, do you roll up the data by day? Or hour? I think in a situation where you roll it up by hour and you have a lot of files, it can be spread out pretty evenly on a large cluster.