Hacker News new | ask | show | jobs
by emedchill 616 days ago
Having special characters is a good idea but having a comma just to break a CSV is dumb. This would only happen if the hacker used a bad exporter or created their own (very poorly).
1 comments

Yeah, this is silly. Pretty much every serializer in existence is going to handle this case. If the attacker wrote their own, then you might get lucky
AFAIU CSV is fundamentally ambiguous and can't actually be parsed in a fully deterministic way.

Edge cases get hard when dealing with nested commas, and there's no standard escape sequence.

Probably matters less with a two column arrangement, but things get really hairy really fast when you start adding types or BLOBs in the CSV.

AFAIK it's only "ambiguous" in the sense that if you get a csv file you can't determine the exact parsing behavior to use, but if you know what program created the csv (or what encoder options were used), it's not ambiguous to parse.

>but things get really hairy really fast when you start adding types or BLOBs in the CSV.

AFAIK BLOBs are hex encoded, which make them a non issue.

Hah! Half the time people will even do silly things like cat together multiple CSVs from different sources.

If blobs got consistently hex encoded, that would also be nice. Base64 is common, and there are multiple types of base64 encoding people use too.

Personally, I tend to think of CSV imports as something you can expect to have a ‘yield’ - and it’s never 100%.

yea so just do BSV or bell separated file. We already have "\n" newline separated files. We just need a cel seperator, '\b'. Problem solved.
On the plus side, accidentally cat’ng it to your terminal will be pleasantly musical.