Hacker News new | ask | show | jobs
by kelnos 2297 days ago
You should be doing that with JSON as well, so this isn't a pro/con of either format.
1 comments

That was obviously my point.

ie just because MessagePack is a binary format it doesn't mean you can skip the same string checks that JSON requires; which means parsing MessagePack strings is unlikely to be any faster than JSON strings (contrary to the suggestions others have implied with the "just memcpy" comments). It's just with JSON that validation is done as part of the parser (remember JSON only technically supports a subset of ASCII and any extended characters or unicode is encoded via escape codes) where as with MessagePack you'd need to do that validation as an additional step.

Integers, on the hand, might differ since JSON would need additional validation (again, backed into the parser) which MessagePack would not because MessagePack encodes the integers as binary integers where as JSON encodes them as ASCII values that would need converting back to binary integers.

(hint: read the message I'm replying to).

Many (most?) applications do not actually care whether a byte blob of text is structurally valid UTF-8. They are either passing it around as an opaque byte blob, or already applying much stricter application-specific validation. Validating UTF-8 automatically at the serialization layer is a huge waste of cycles, especially in a big distributed system.
On closed systems where you control both the input and output, then sure (Though I’d still recommend against that particular short cut because it’s an easy way for bugs to go undetected).

However if you’re accepting MessagePack encoded data from insecure systems (such as end users) then you absolutely should be validating your input somewhere along the pipeline and it’s usually better to do that early on.

Also it’s not generally the distributed systems you worry about when it comes to this specific degree of micro-optimisation (which is basically what this is). It’s the monolithic ones. Distributed architecture is meant to solve various problems (for example but not limited to, high availability, reduced geographical latency, single site but running on cheaper commodity hardware, etc) but often at the cost of CPU cycles. Whereas your monolithic infrastructures where you have fewer servers (such as Stack Overflows set up) would be greatly more dependant on reducing computational overhead where corners could be cut. However they’d also be significantly less likely to need networked RPCs via MessagePack anyway (simply due to the monolithic design of their architecture).