Hacker News new | ask | show | jobs
by mcroydon 5528 days ago
Thanks for the tip, I didn't think to copy the data to ephemeral storage like that. That'll probably speed things up a lot.

I ended up splitting the data in to a relatively small number (~200) of ~30MB gzipped files in order to initially saturate the mappers and speed things up. If that's not necessary after moving to ephemeral storage that's fine by me!

1 comments

It's not necessary once your files live on ephemeral storage, but it would be necessary if you want the distcp operation to be fast. But again, the s3 block filesystem will not have this problem.