|
|
|
|
|
by zAy0LfpBZLC8mAC
4400 days ago
|
|
What would happen is that you first would replace #COMMA, with #COMMA#COMMA# and then later replace that with ,COMMA# , thus garbling the data. The way to make the data accessible is to request the producer to be fixed, it's that simple. If that is completely impossible, you'll have to figure out the grammar of the data that you actually have and build a parser for that. Your suggested strategy does not work. |
|
https://github.com/dbro/csvquote is a small and fast script that can replace ambiguous separators (commas and newlines, for example) inside quoted fields, so that other text tools can work with a simple grammar. After that work is done, the ambiguous commas inside quoted fields get restored. I wrote it to use unix shell tools like cut, awk, ... with CSV files containing millions of records.