Hacker News new | ask | show | jobs
by horsawlarway 1832 days ago
I don't really agree... with any of this, actually.

It IS simpler to use text - He claims "Text was never as portable as it's believed to be.", but ascii/unicode are probably the most portable formats we've ever created. I can't think of a computer that won't be able to parse and display one of those two formats (From embedded hardware, to old f16 parts, to my modern laptop, to the raspberry pi, to the fucking computer I designed in my EE classes).

Being able to type out messages is hugely helpful while debugging and developing (I copy and paste things that look exactly like the code he claims no one would ever write - It's like he doesn't understand the value of a clipboard, or a text editor I can dump a message into and change a single value in - Something I can conveniently do on pretty much any system ANYWHERE without having to install any extra software if the format is text)

His parsing example is hilarious - See that readable text above? Psh, Folly! That's hard to read so lets use specialized tools that depend entirely on system specific details and configuration (Int size, byte order, struct packing, etc) and claim that's better!

Extensibility, meh - I find this one rarely matters as much as people believe it does, but to me, the big benefit of text is that I can easily craft messages with new fields myself without having to write code to do it.

Error recovery... I can sort of agree (in transit over a noisy channel, use a format that supports ECCs) but he misses that there are two different types of error here - An unexpected field value/type, and a generally malformed payload.

The first will break binary but not something like a json parser. The second will break both (he only talks about the second, since he assumed the failure happens at tokenization time...)

Basically - My whole point devolves into "It sure seems like he's arguing for premature optimization".

If you have a spot where text is particularly expensive or inefficient, suck it up and move to a binary protocol that requires more documentation, tooling, and work. Everywhere else... it seems like a bad move.

2 comments

I've taken advantage of text protocols countless (hundreds? thousands?) of times in my career to troubleshoot, learn, and experiment.

Just a few weeks ago when I needed to peek into some StatsD packets to ensure we were sending what we expected when monitoring wasn't working. If it was a binary format this simply would not have been an option as this was a remote environment with limited tooling available to it.

> Being able to type out messages is hugely helpful while debugging and developing

You are only a small fractional part of the entire life of the protocol. Protocol designs largely impact hardware design and requirements. A binary protocol will safely work on a micro-controller without too much effort while a text based protocol requires some serious CPU juice, code storage space and volatile memory to get it off the ground.

Sure, text will work on most devices, but the parsers for those text based protocols become excessively complex. There is some ideas floating on in the field of "green" computing where its becoming increasingly imperative to do more processing per watt. Text parsers will certainly not fit into that bill.

> new fields myself without having to write code to do it.

Then what's the point of that new field if there's no code to handle it? That said, CBOR and TLV is similar too -- in that, you can add new fields without any code to handle it. But what good is it?

> An unexpected field value/type, and a generally malformed payload.

    Content-Length: bad-string
Your parser will break the key-value pair and hand it off to a call-back or something that makes sense of what key means, perhaps a semantic analyzer? And when it reaches that point, you have irreversibly wasted enough CPU time already only to discover that the whole message is invalid. Not that I am unaware of the difference, just that differentiating them is often pointless.

Continuing the hilarity...

> That's hard to read so lets use specialized

Is it CR? Is it LF? Is it CRLF? Did I configure my text editor to use the correct line terminator? Does my clipboard reset the CRLF to CR or LF? Oh, wait is that a space (0x20) or tab (0x09) there? Hmm.. never mind.

Also,

    From: 1234<1234@example.org>;branch=abcd1234
vs

    From: 1234<1234@example.org;branch=abcd1234>
Which is the right way? Should my parser expect `branch` in URI or should it expect only when parsing `From` and `To` address? Should I make it part of the URI sub-parser or should I make it part of the top-level parser for endpoint addresses? This was a real inter-op problem between two big vendors.