Hacker News new | ask | show | jobs
by ComputerGuru 3206 days ago
> but text can contain commas

Erm.. text can contain tabs, too. This problem was solved so, so long ago when all the various ANSI/ASCII/whatever encodings were compiled by specifically reserving not one but two characters precisely to serve as field and record delimiters.

0x30 and 0x31 solved not only the problem of having commas or tabs in your text preventing you from treating them as field delimiters, but also allowing you to include new lines and carriage returns in your fields, too!

0x30 is the unit separator (aka field delimiter) and 0x31 is the record separator (aka, well, the record separator).

I _believe_ there was a record key on some standardized keyboard layout back in the day, too.

Edit: sorry, they are decimal, not hex. Thanks @jrochkind1

4 comments

I find no matter what you do you will _sometimes_ need escaping. There will, eventually, come a time when you want to embed an ASCII 30 (0x usually means hex, it's actually decimal code 30, but hex 0x1E) RS Record Separator in some record delimited by 30 RS. So you'll need some method of escaping anyway. Or it'll be annoying.

I have spent some time working with MARC 21 binary encoding (used for library cataloging records) which uses ASCII 0x1D, 0x1E, and 0X1F as delimiters. I would def not call it appreciably more _convenient_ than a more modern 'text' record format. If it has benefits, convenience isn't really one of them.

I think it's common to use ESC (0x1b) and then set the high bit on the next byte, so ESC itself would be sent as 0x1b, 0x8b.
Yes, but at that point the text file is basically binary - it contains exotic characters that confuse most text editors and can't be typed.

I know XML et al are frustrating, but I'd rather see them than a "creative" solution. It seems like 60% of the reason we still have to deal with archaic flat formats is support for Excel.

I have to suspect the fatal flaw is that these code points don’t look like anything and can’t be found on the keyboard.

Granted, that’s the whole point, but it also makes authoring and instruction harder. (And we all know how many programmers are really just competent copy-pasters.)

Right. And if you add 0x28 (file separator) and 0x29 (group separator) to the mix, then you have a whole set of nice options to concatenate multiple data files into a single stream, etc.