Hacker News new | ask | show | jobs
by Yomguithereal 449 days ago
A good way to parallelize CSV processing is to split datasets into multiple files, kinda like manual sharding. xan has a parallel command able to perform a wide variety of map-reduce tasks on splitted files.

https://github.com/medialab/xan

1 comments

nice .. xsv is also very handy for wrangling csv files generally
xan is a maintained fork of xsv