Hacker News new | ask | show | jobs
by deno 3581 days ago
Better than node.js

    import * as csv from 'csv-parse';
    import * as fs from 'fs';
    
    type Line = [string,string,string,string,string,string];
    
    const parser = new csv.Parser({});
    
    parser.on('data', (line: Line) => { 
    if (line[0] === '42') {
            console.dir(line);
        } 
    });
    
    fs.createReadStream('mock_data.csv').pipe(parser);
    
    $ /usr/bin/time node parse_csv.js
    43.61user 0.85system 0:45.61elapsed 97%CPU (0avgtext+0avgdata 60076maxresident)k
    
    $ node --version
    v6.4.0
Edit: Using fast-csv

    24.28user 0.20system 0:24.58elapsed 99%CPU (0avgtext+0avgdata 91780maxresident)k
3 comments

Probably has to do with using `try/catch` whis is not omptimized by V8. Different parser is 10 times faster on my machine.

https://www.npmjs.com/package/csv-parser

Edit: `fast-csv` seems to be using a lot of `RegExp`s on each iteration which can't be that fast compared to csv-parser which seems to simply go over each symbol (state machine?).

    4.95user 0.19system 0:05.22elapsed 98%CPU (0avgtext+0avgdata 29704maxresident)k
A lot better but that’s still 5× slower than Python.
csv-parse is hardly the only CSV parser for node, and it is by far the slowest: https://github.com/phihag/csv-speedtest (csv2json depends on csv-parse, so it's unsurprising that it's even slower)
I chose the most popular one on npm because Go and Python are using stdlib.
But you still wrote "faster than node.js" and not "faster that most popular npm module" (which aren't always of a great quality or performance-oriented).
The fastest streaming example on that list (csv-parser) is still 5× slower than Python.
Which I'd assume has to do with the overhead of dispatching a ton of asynchronous events for relatively little parsed data rather than the intrinsic speed of node, the fastest synchronous parsers of the list are about on-par with Python.
You don't need to drop quote handling in order to stream, but handling quotes is a pain.