|
|
|
|
|
by EdwardDiego
1512 days ago
|
|
Indeed, every distributed query engine I've used can easily parallelise CSV in the same file (so long as it's splittable, friends don't let friends gzip their data), with the option to ignore bad rows, log them, or throw your hands up and die. Admittedly, all of them are Java based and use Hadoop libs for handling CSV, which makes sense, the Elephant ecosystem has spent years getting this stuff right. |
|