Why does Xi use JSON in the first place? It would be easier and faster to use a binary format, e.g. Protobufs, Flatbuffers or if the semantics of JSON is needed: CBOR.
> JSON. The protocol for front-end / back-end communication, as well as between the back-end and plug-ins, is based on simple JSON messages. I considered binary formats, but the actual improvement in performance would be completely in the noise. Using JSON considerably lowers friction for developing plug-ins, as it’s available out of the box for most modern languages, and there are plenty of the libraries available for the other ones.
We actually do get 60fps, but JSON parsing on the Swift side takes more than its share of total CPU load, affecting power consumption among other things. So (partly to address the trolls elsewhere in the thread), the choice of JSON does not preclude fast implementation (as the existence of simdjson proves), but it does make it dependent on the language having a performant JSON implementation. I made the assumption that this would be the case, and for Swift it isn't.
At some point though, isn't it maybe easier just to use an inherently more efficient format than trying to rely on clever implementations to save you?
I totally get json for public internet services where you want to have lots of consumers and using a more efficient format would be significant friction, but writing an editor frontend is a very large endeavor -- it seems like the extra work of adopting something more efficient than json (like flatbuffers or whatever) would really be in the noise.
It's a complicated tradeoff. It's not just performance, the main thing is clear code. Another factor was support across a wide variety of languages, which was thinner for things like flatbuffers at the time we adopted JSON. Also, "clever implementations" like simdjson don't have a high cost, if they're nice open source libraries.
The problem with clever implementations isn't that they can't be reused or that they have abnormally high cost for end-users (though this is sometimes the case). It's that they inherently require more work to maintain, author, and debug over time. When you're talking about a cross language protocol that will have myriads of available implementations (each with different constraints), it's not unreasonble to take a look at how much work a third party must engage in to get such a "clever" implementation (or, in other words, "how many people could reimplement simdjson?") And if those existing clever implementations aren't available (or viable) for some use case, then you're out of luck and start at square one. This happens more often than you think.
In this case there's a lot of work already put into fast JSON parsers, but in general JSON is not a very friendly format to work with or write efficient, generalized implementations of. Maybe it's not worth switching to something else. I'm not saying you should, it seems like a fine choice to me. But clever implementations don't come free and representation choice has a big impact on how "clever" you need to be.
Re clear code, to my mind it comes out pretty much the same regardless of serialization† format: best approach is to have protocol be written down in some real language (e.g. flatbufs schema or annotated rust structs or whatever), and codegen for target languages.
My guess is it's easier to write an efficient flatbuffers (or similar) serializer+deserializer than an efficient json serializer+deserializer. And the top-end of performance definitely higher.
So if you're already reaching the point of needing to write your own json deserializers...
(† Unless you're talking about some hand-written bespoke binary format, but that would almost certainly be crazy.)
One of the other performant libraries in the comparison section of simdjson has a Swift wrapper: https://github.com/chadaustin/sajson. Haven't tried it, but one option would be to bring that up to date. Another option, now that Swift 5 strings use utf8 as a native encoding, it may be possible to write a fast Json parser in native Swift. Likely someone already has or is doing that.
Given equally-high quality JSON and binary serdes, JSON is sufficiently fast. Raphlinus is saying that Swift's built-in deserialiser is obnoxiously slow.
Xi has multiple components written in multiple languages. In the rust core, json de/serialization is not a problem, but swift is lacking a similar high-performance library.
I'm going off topic at this point but I'd think for a native app the main advantages of a binary format would be the static typing and code generation that come from using an IDL.
I'm familiar with serde. It's an incredible project, but I wouldn't quite call it "static typing for JSON". You still have to unwrap the parse at some point. However, I will concede the point that if you have Rust on both sides then you'll get most of the benefits.
Because JSON encoding/decoding was not found to be a typical performance bottleneck, and because JSON is supported in virtually every programming language (Xi allows you to write frontends in pretty much any language you want).
After spending most of a year doing deep surgery on systems that used CBOR extensively, I can report that the common CBOR parsers are not faster than common JSON parsers; surprisingly, they are actually slower. CBOR is also not easier; it's much less widely supported, and you need a separate debugging representation. It does have three real advantages over JSON: it supports binary strings, it's a monument to Carsten Bormann's ego, and data encoded in CBOR takes slightly fewer bytes than the same data encoded in JSON. (The second is only an advantage if you're Carsten Bormann.)
1) there's a distinction between integers and floating point values;
2) you can semantically tag values (yes, this is a text string, but treat it as a date; this is a binary string, but treat it as a big number; etc.);
3) you can have maps with non-text keys.
I'm not sure what Carsten Bormann's ego has to do with CBOR, but I found RFC-7049 one of the better written specs, with plenty of encoding examples. It made it real easy to write a encoder/decoder [1] and use the examples as test cases.
All three of those could be advantages under some circumstances, but I've more often found them to be disadvantages. What do you do with maps with non-text keys when you're deserializing in JS or Perl? For that matter, what do you do in Python when the key of a map is a map? When you have a date, do you decode it as a datetime object, as a text string, or as some kind of wrapper object that gives you both alternatives?
I agree that having lots of examples in the spec is good.
> What do you do with maps with non-text keys when you're deserializing in JS or Perl?
Um, use another language? I use Lua, which can deal with non-text keys. As for decoding dates (if they're semantically tagged, which you can with CBOR) I convert it to a datetime object, on the grounds that if I care about tagged dates, I'm going to be using them in some capacity.
But that's not to say you have to use the flexibility of CBOR. But for me, having distinct integer and floating point values, plus distinct text and binary data, is enough of a win to use it over JSON.
While theoretically true, in practice the actual character parsing tends to a small to negligible part of the overall time. Which leads to the measurable fact that on macOS/iOS, the JSON serialization stuff is actually one of their fastest, faster than their binary stuff.
> JSON. The protocol for front-end / back-end communication, as well as between the back-end and plug-ins, is based on simple JSON messages. I considered binary formats, but the actual improvement in performance would be completely in the noise. Using JSON considerably lowers friction for developing plug-ins, as it’s available out of the box for most modern languages, and there are plenty of the libraries available for the other ones.
1: https://github.com/xi-editor/xi-editor/blob/master/README.md...