Hacker News new | ask | show | jobs
by bangonkeyboard 2425 days ago
For the quoted spreadsheet-like operations of filtering and rearranging, awk is perfect as a deferred editor. That the kludgy first step of your chosen solution ("First, you must create a CSV file contain only the first 10-20 lines of your large CSV file") isn't just `head very_large_nov_2019.csv > very_large_nov_2019_abridged.csv` seems to further indicate an unfamiliarity with the large set of built-in, battle-tested UNIX tools for dealing with text files.

The first tools I reach for when dealing with CSVs of these and larger magnitudes are less, cut, awk, etc. They also tend to be the last tools I end up needing.

1 comments

How well do those tools work with arbitrary CSV files, e.g. containing line breaks or quotes in field data? I wasn't aware that they can actually parse CSV and instead you have to assume things about the content that may not end up being true.
Every data processing task has to make assumptions about the well-formedness of its input. "Arbitrary CSV" is basically undefined; whether deviations are best dealt with by parsing, preprocessing, or different tools altogether depends on the source.