Hacker News new | ask | show | jobs
by userbinator 2189 days ago
Or even better: if you can change the problem so that you don't need to parse at all, then do that instead.

Recently I made a comment on the overhead of textual formats from the validation perspective: https://news.ycombinator.com/item?id=23582056

I think that a textual format is overall a horrible idea if the use-case is not presenting the vast majority of the content to humans.

In an ideal world, communication between machines would be all in binary protocols, and developers would know how to read/write them with tools like hex editors as naturally as a second language.

Instead, countless amounts of time and space are wasted by machines converting their native integers into strings, wrapping it in JSON, base64'ing that, wrapping it into XML, then sending it over the network (whose lower layers are thankfully binary) to another machine where the reverse process happens, but with additional checks during parsing. (I am not exaggerating. I have seen systems like this.) 99.99999...% of this data will never be seen by a human. What a disgusting waste of computing power.

"The fastest way to do something is to not do it at all."

1 comments

Author of the article here - I completely agree. This text parsing problem came up because I was lamenting how many cycles were wasted in text processing. At my current job a text-based protocol from a 3rd party means around 80% of CPU time for the entire application is spent parsing and rendering text just for the protocol. JSON and HTTP come to mind too.

The human readability argument doesn't really hold any water because if you have a structured description of a protocol (e.g. a C struct), you can always write simple tools to inspect the protocol and make it just as humanly readable as JSON is. This is even easier if a language has reflection to generate all this code.

What prevents you from using CBOR? I do understand that a fixed schema is more of a shackle and can't be used in every situation but CBOR doesn't require an external schema.