|
|
|
|
|
by SahAssar
746 days ago
|
|
If you propose to use a TSV parser that does not handle escaping then that sounds very unsafe to me. Do you also want to skip checking for escaped newlines? Or escaped backslashes? What you are proposing is not using TSV, but a format that completely bans newlines or tabs in any data. There are certainly uses for such a format but to make it non-risky to use you'd need strict validation on the input to the encoder and make it very clear that it is not TSV, since it does not follow the rules of TSV encoding/decoding and will not produce the same data as a proper TSV implementation. |
|
What I'm trying to get at, is that there are situations in which implementing such a limited parser is justifiable (and for the main discussion in this thread, TSV makes this more commonly achievable than CSV).
With the luxury of time, all our parsers would handle delimiter escaping, unicode, control characters, byte order marks, etc, perfectly, and truly parse "TSV" and "CSV. Personally, I work on-call in SRE - if something is broken, we need solutions NOW. If I have a CSV of stuff, I am not going to implement a proper parser, I don't even have time to boot up a programming language with a CSV library, I am going to split by comma in the terminal of whatever box I'm logged into to get what I need. Most of the time it'll work, and to the discussion in the thread, TSV makes it more likely to work because it's less likely for the delimiter to be in the data. Less likely to need need those 5-6 extra characters of regex lookbehind.
My main point: as a consumer of these files, I prefer it when people send me TSVs rather than CSVs, because I am more likely to be able to use a simple not-really-TSV/CSV parser to read them. Sometimes the data's really messy and I need a real parser, but TSV makes this less likely.