|
|
|
|
|
by necrotic_comp
807 days ago
|
|
Knowing the command line is one of those unsung critical skills of being a software engineer. Doesn't really matter so much what flavor of command line you're using (though bash is great for obvious reasons), but the ability to glue different, disparate commands together through pipes is one of the most efficient ways of getting throwaway work done. Heck, just knowing IFS=$'\n' and how to do a for loop will get you a long way. |
|
Yep, the command line is what lets you solve "We have 178 CSV dumps of tables that has ~60 GB of data and we want them imported into a SQL database, there's no previous DB schema info, here's a zip file of questionably named CSVs, can you have this done in 2 days?". Meanwhile there's 8,000+ columns of data that are strings, booleans, datetimes, etc. and some of the files are 15 GB each.
It didn't take too much shell scripting to solve that problem in a way that you can run it against a directory of CSV files and have it produce SQL files with table schemas that can be created and then generate the SQL to efficiently import them from a CSV. Basically a little bit of shell scripting and using tools like find, head, sed, grep, wc and friends. It took 4 hours to solve the problem in a way that was testable.