Hacker News new | ask | show | jobs
by antonycourtney 3306 days ago
I invested some effort to keep it performant even with fairly large CSV files, including a custom port of some C++ code for fast CSV import. My current favorite example is the Met Museum's 228MB 450k row collection data set; takes about 12 sec. to open in Tad on my 2013 MacBook Pro. Definitely not lag free (and hard to achieve that without going to some serious column store data warehouse like Amazon Redshift), but still reasonable. https://twitter.com/antonycourtney/status/869252722624561152
2 comments

Thanks for putting this out there!

There are some projects out there using memory mapped files to do fast CSV parsing. Could be a nice way to speed up the memory loading and scroll it in real time. Can't find the link to the library I saw it used in, but it might be an interesting venue to consider. Another library that does it seems to be astropy fast ascii IO module [1].

[1]: http://docs.astropy.org/en/stable/io/ascii/fast_ascii_io.htm...

Try benchmarking OS read() calls vs. either sequential or random reads using memory mapping, whenever I do this OS read() calls end up being quite a bit faster.
Are you familiar with the R package data.table? Its CSV parser is blazing fast. Pandas (the Python tabular data library) also implements a speedy CSV parser. Both are written in C under the hood.