|
We're coming at this from different angles. I completely agree with you that the proper way to read these files is using a fully standards-compliant parser. You make the distinction that a parser that can't handle tabs in the data doesn't technically parse "TSV", instead a subset of TSV-like files with limitations - sure, that makes sense. What I'm trying to get at, is that there are situations in which implementing such a limited parser is justifiable (and for the main discussion in this thread, TSV makes this more commonly achievable than CSV). With the luxury of time, all our parsers would handle delimiter escaping, unicode, control characters, byte order marks, etc, perfectly, and truly parse "TSV" and "CSV. Personally, I work on-call in SRE - if something is broken, we need solutions NOW. If I have a CSV of stuff, I am not going to implement a proper parser, I don't even have time to boot up a programming language with a CSV library, I am going to split by comma in the terminal of whatever box I'm logged into to get what I need. Most of the time it'll work, and to the discussion in the thread, TSV makes it more likely to work because it's less likely for the delimiter to be in the data. Less likely to need need those 5-6 extra characters of regex lookbehind. My main point: as a consumer of these files, I prefer it when people send me TSVs rather than CSVs, because I am more likely to be able to use a simple not-really-TSV/CSV parser to read them. Sometimes the data's really messy and I need a real parser, but TSV makes this less likely. |
My point is that you are not really talking about CSV/TSV since your parser does not handle CSV/TSV. You are using a custom dataformat. Which is fine and perfectly reasonable, and its probably specified to avoid all those issues.
But it is not CSV or TSV. When you say "a simple not-really-TSV/CSV parser to read them" you mean you are not using CSV or TSV. That's fine for non-CSV and non-TSV. usage. Just be clear about what format you are actually using and specify it. It clearly isn't TSV or CSV.