|
|
|
|
|
by SammyStacks
669 days ago
|
|
From a pragmatic viewpoint, the CSVs that I get from finance (usually saved as .xlsx) have the same issues for parsing the data as a CSV. But since the issues are consistent, I can automate conversion from .xlsx to CSV, then process the CSV using awk to eliminate errors in further parsing the CSV (for import, analysis, etc.). Sure, I'm essentially parsing the CSV twice but, because the parsing issues are consistent, I can automate to make the process efficient. Obviously that wouldn't work for CSVs with different structures, but can be effective in the workplace in certain scenarios. |
|
However, if you ever have the misfortune of dealing with human generated files (particularly Excels) then you will suffer much pain and loss.
I once had to deal with a "CSV" which had not one, not two but 6(!) distinct date formats in the same file. Life as a data scientist kinda sucks sometimes :shrug:.