Hacker News new | ask | show | jobs
by NortySpock 818 days ago
So, grab yourself a Linux box (I suggest Debian), a large CSV file or JSON lines file you need to slice up, and an hour of time, and start trying out some bash one-liners on your data. Set some goals like "find the Yahoo email addresses in the data and sort by frequency" or "find error messages that look like X" or "find how many times Ben Franklin mentions his wife in his autobiography"

Here's the thing. These tools have been used since the '70s to slice, dice and filter log files, CSVs, or other semi-structured data. They can be chained together with the pipe command. Sys admins were going through 100MB logs with these tools before CPUs hit the gigahertz

These tools are blisteringly fast, and they are basically installed on every Linux machine.

https://github.com/onceupon/Bash-Oneliner

And for a different play-by-play example:

https://adamdrake.com/command-line-tools-can-be-235x-faster-...