| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by modulus1 795 days ago
	Can't store tabs or newlines, odd choice.

2 comments

romanows 795 days ago

Reading quick, it's because the tab is used to indicate nested tabular data in a column. I wonder why not just have a zsv in the zsv?

link

hafthor 795 days ago

yeah, this is a limitation from the TSV format this is based on - there is an extension to the format that supports storing binary blobs - ref: https://github.com/Hafthor/zsvutil?tab=readme-ov-file#nested...

link

inimino 795 days ago

(Offtopic, but just FYI) it's tenet (principle) not tenant (building resident).

link

hafthor 795 days ago

doh. fixed. thanks!

link

alexandreyc 795 days ago

Basically it's the same limitations as CSV.

At least you could use something less likely to appear in data as record sepator (like 0x1E)

Otherwise it's an interesting idea!

link

tboerstad 795 days ago

0x1E is the record separator, in ASCII precisely for this purpose. Too bad it’s not popular, here we’re stuck with inferior TSV/CSV

link

mattnewton 795 days ago

I can't easily type that out - and once the format can't be read / editing in a simple text editor, I'm starting to lean towards a nice binary format like protobuf.

link

orf 795 days ago

Strings can contain 0x1E, so it has exactly the same issues as a tab character but with all the downsides of it not being an easy, “simple” character.

link

bobbylarrybobby 795 days ago

As far as I know, thanks to quoting it is possible to put basically any data you want in a CSV.

link

layer8 795 days ago

The problem is there is no uniform standard for quoting and escaping in CSV, and different software uses different variants.

link

Dylan16807 795 days ago

There is a standard, and it is very simple and easy to use.

Different software uses different variants because we're not allowed to have nice things and devs are too lazy to use something slightly more complicated than .split(',')

Though if you're going to ban some common characters anyway like TSV, you might as well use CSV and ban commas, newlines, and quotation marks.

link

nextaccountic 795 days ago

can't you just do quoting?

link

olejorgenb 795 days ago

https://github.com/Hafthor/zsvutil?tab=readme-ov-file#what-a...

> Any escaping or encoding of these characters would make the format less human-readable, harder to parse and could introduce ambiguity and consistency problems.

Found the wording of "could introduce ambiguity and consistency problems" a bit odd, but guess they mean that even if things are specified precisely (so there's no ambiguity) not everyone would follow the rules or something? And they want to play nice with other tools following the TSV "standard"

link

8n4vidtmkvmk 795 days ago

Please. I wrote a csv parser a couple weeks ago in an hour or two. It's not that hard to handle the quoting and edge cases. Yes, maybe different parsers will handle them differently, but just document your choices and that's that. How is ambiguity better than completely disallowing certain chars? That's a non-starter

link