Hacker News new | ask | show | jobs
by raphlinus 2674 days ago
It's a complicated tradeoff. It's not just performance, the main thing is clear code. Another factor was support across a wide variety of languages, which was thinner for things like flatbuffers at the time we adopted JSON. Also, "clever implementations" like simdjson don't have a high cost, if they're nice open source libraries.
2 comments

The problem with clever implementations isn't that they can't be reused or that they have abnormally high cost for end-users (though this is sometimes the case). It's that they inherently require more work to maintain, author, and debug over time. When you're talking about a cross language protocol that will have myriads of available implementations (each with different constraints), it's not unreasonble to take a look at how much work a third party must engage in to get such a "clever" implementation (or, in other words, "how many people could reimplement simdjson?") And if those existing clever implementations aren't available (or viable) for some use case, then you're out of luck and start at square one. This happens more often than you think.

In this case there's a lot of work already put into fast JSON parsers, but in general JSON is not a very friendly format to work with or write efficient, generalized implementations of. Maybe it's not worth switching to something else. I'm not saying you should, it seems like a fine choice to me. But clever implementations don't come free and representation choice has a big impact on how "clever" you need to be.

Re clear code, to my mind it comes out pretty much the same regardless of serialization† format: best approach is to have protocol be written down in some real language (e.g. flatbufs schema or annotated rust structs or whatever), and codegen for target languages.

My guess is it's easier to write an efficient flatbuffers (or similar) serializer+deserializer than an efficient json serializer+deserializer. And the top-end of performance definitely higher.

So if you're already reaching the point of needing to write your own json deserializers...

(† Unless you're talking about some hand-written bespoke binary format, but that would almost certainly be crazy.)