Hacker News new | ask | show | jobs
by pwdisswordfish8 1832 days ago
Another argument: text-based protocols often admit too many degrees of freedom in constructing messages, the handling of which is left underspecified and completely overlooked during implementation. (What happens if you separate lines with LF instead of CRLF in HTTP headers? What if the opening and closing HTML tags don't match? I know this should not usually happen, but how should I handle it when it does anyway?)

It's not by any means exclusive to text-based protocols, but there's this tendency to assume everything about a text-based protocols is ‘obvious’, ‘self-documenting’ and doesn't need specifying, and to think that just because the individual elements of the protocol are human-readable, this will somehow magically make the computers using the protocol follow the Gricean maxims (if it doesn't make sense, nobody will ever say that, therefore I don't need to think about it).

1 comments

> the handling of which is left underspecified

I used to see Postel's Law ("be conservative in what you send, be liberal in what you accept") quoted as some sort of antidote, but it seems to have fallen out of fashion -- I think enough people saw how that ideal played out in reality. Nowadays a JSON library feels justified throwing a fit if it sees a comment string instead of playing along with such shenanigans.

> It's not by any means exclusive to text-based protocols

Plus, I would argue text-based greatly increases the surface area for ambiguity, whereas, for instance, there are only a few ways a sane person would send an integer as bytes.

I’d say it’s played out ... ambiguously. Conservative HTTP unworkable, liberal TLS dangerous. JSON for internal APIs and data succeeded because it’s conservative, XHTML and XML+XSLT on the open web failed for the same reason. Postel’s law is less of a universal principle than it initially seemed to be, sure, but it appears to me that part of the reason for its increasing irrelevance is our moving away from open ecosystems and not deficiencies valid in its original context.

Integer encoding (as opposed to e.g. encoding of opaque binary strings) actually appears to be a bad example to me: various universal binary encoding protocols, self-describing or not, have an astounding number of unsigned and signed integer encodings among them. It’s like inventing a new one is a rite of passage or something.

I see your point, but I don't agree about why XHTML failed. For starters, see: https://en.wikipedia.org/wiki/WHATWG (Basically, XHTML failed because it was a pointless boondoggle, whereas HTML5 very much wasn't.)

Regarding binary integers, having written code for a few common binary protocols and file formats I've never had to think very hard about it (just: How long? Which endian? Signed?) but maybe it's different for older or more esoteric stuff.

Re integers, it’s not the esoteric stuff, it’s the flexible, supposedly universal stuff: there’s like half a dozen varieties of varints across MessagePack, CBOR, Protobufs, ASN.1 *ER, etc.; even UTF-8 is just a (limited-range) varint encoding from a certain point of view. “Zigzag encoding” (using the least significant bit as the sign bit) is particularly insidious. And note that the (integer) exponent in IEEE floating-point formats is signed but not two’s complement: it’s in a biased representation instead.
Er, no, that’s not what I was referring to. The XHTML 2 story was stupid, yes (though I think the RDF / “Linked Data” tooling could’ve been really nice had it not been a fantasy), but lots and lots of people were willing to give XHTML 1.1 a chance during the XML craze and the original web standards push; except the HTML 4.01 Strict rules which XHTML 1.1 enforced were complicated enough that nobody ended up willing to tolerate showing the user literally nothing for every fumble in a server-side script. (Part of the problem was that people were routinely generating markup from textual templates.)