Hacker News new | ask | show | jobs
by ritter2a 2189 days ago
Certainly a nice read.

On the "culture of 'optimization is the root of all evil'" remark in the conclusions: I find this to be a nice example for the full Knuth quote.

If you face an arbitrary task including parsing 64 bit integers, starting by developing/using the technique from the article (as a _premature_ optimization) is probably a bad idea since it costs time for the implementation (and even more time for debugging and understanding the code a few months later), while in most cases, it is probably not what dominates the running time of your code. If you however have built a solution that does the job, but is just not fast enough, and profiling shows you that you spend considerable time parsing integers, this kind of optimization is the way to go.

1 comments

Or even better: if you can change the problem so that you don't need to parse at all, then do that instead.

Recently I made a comment on the overhead of textual formats from the validation perspective: https://news.ycombinator.com/item?id=23582056

I think that a textual format is overall a horrible idea if the use-case is not presenting the vast majority of the content to humans.

In an ideal world, communication between machines would be all in binary protocols, and developers would know how to read/write them with tools like hex editors as naturally as a second language.

Instead, countless amounts of time and space are wasted by machines converting their native integers into strings, wrapping it in JSON, base64'ing that, wrapping it into XML, then sending it over the network (whose lower layers are thankfully binary) to another machine where the reverse process happens, but with additional checks during parsing. (I am not exaggerating. I have seen systems like this.) 99.99999...% of this data will never be seen by a human. What a disgusting waste of computing power.

"The fastest way to do something is to not do it at all."

Author of the article here - I completely agree. This text parsing problem came up because I was lamenting how many cycles were wasted in text processing. At my current job a text-based protocol from a 3rd party means around 80% of CPU time for the entire application is spent parsing and rendering text just for the protocol. JSON and HTTP come to mind too.

The human readability argument doesn't really hold any water because if you have a structured description of a protocol (e.g. a C struct), you can always write simple tools to inspect the protocol and make it just as humanly readable as JSON is. This is even easier if a language has reflection to generate all this code.

What prevents you from using CBOR? I do understand that a fixed schema is more of a shackle and can't be used in every situation but CBOR doesn't require an external schema.