Hacker News new | ask | show | jobs
by modulus1 795 days ago
Can't store tabs or newlines, odd choice.
2 comments

Reading quick, it's because the tab is used to indicate nested tabular data in a column. I wonder why not just have a zsv in the zsv?
yeah, this is a limitation from the TSV format this is based on - there is an extension to the format that supports storing binary blobs - ref: https://github.com/Hafthor/zsvutil?tab=readme-ov-file#nested...
(Offtopic, but just FYI) it's tenet (principle) not tenant (building resident).
doh. fixed. thanks!
Basically it's the same limitations as CSV.

At least you could use something less likely to appear in data as record sepator (like 0x1E)

Otherwise it's an interesting idea!

0x1E is the record separator, in ASCII precisely for this purpose. Too bad it’s not popular, here we’re stuck with inferior TSV/CSV
I can't easily type that out - and once the format can't be read / editing in a simple text editor, I'm starting to lean towards a nice binary format like protobuf.
Strings can contain 0x1E, so it has exactly the same issues as a tab character but with all the downsides of it not being an easy, “simple” character.
As far as I know, thanks to quoting it is possible to put basically any data you want in a CSV.
The problem is there is no uniform standard for quoting and escaping in CSV, and different software uses different variants.
There is a standard, and it is very simple and easy to use.

Different software uses different variants because we're not allowed to have nice things and devs are too lazy to use something slightly more complicated than .split(',')

Though if you're going to ban some common characters anyway like TSV, you might as well use CSV and ban commas, newlines, and quotation marks.

can't you just do quoting?
https://github.com/Hafthor/zsvutil?tab=readme-ov-file#what-a...

> Any escaping or encoding of these characters would make the format less human-readable, harder to parse and could introduce ambiguity and consistency problems.

Found the wording of "could introduce ambiguity and consistency problems" a bit odd, but guess they mean that even if things are specified precisely (so there's no ambiguity) not everyone would follow the rules or something? And they want to play nice with other tools following the TSV "standard"

Please. I wrote a csv parser a couple weeks ago in an hour or two. It's not that hard to handle the quoting and edge cases. Yes, maybe different parsers will handle them differently, but just document your choices and that's that. How is ambiguity better than completely disallowing certain chars? That's a non-starter