|
|
|
|
|
by jtheory
4276 days ago
|
|
Taken with a grain of salt, of course, because the winner is the one running the competition. Worth noting: - All but the last-place finisher here are actually placed quite closely in performance, given that they're working on a 3 million record file. - Other performance stats that could be more relevant (depending on what you're doing with CSV...): startup time, memory footprint, any differences in handling based on very long or very short rows - Given similar performance on the above, what's actually more important (for most uses): elegance & consistency of API, support for various CSV formats (e.g., Excel vs. RFC-4180 etc. vs. flexibility for rolling your own format), and sensible error handling options (like: don't blow up if there's one row with a different number of columns). I've hardly reviewed any of these, so I can't really ompare them usefully, but I've been using the Apache Commons CSV parser 1.0 version recently (finally released after who knows how many years in semi-hibernation!), and it's been pleasant to work with thus far. |
|
Definitely applaud the effort, and it would be good to extend the test corpus in terms of record length and escape complexity. I do think 3M records is on the low side. Good to see scale tests for 1OM, 100M, 1BN records too.