Hacker News new | ask | show | jobs
by poettering 624 days ago
The marshalling cost for JSON is negligible. Yes, it might be a bit slower than GVariant for example, but only by some fractional linear factor. And on small messages (which D-Bus currently always is, due to message size constraints enforced by broker) the difference is impossible to measure. To a point it really doesn't matter, in particular as JSON parsers have been ridiculously well optimized in this world.

What does matter though are roundtrips. In Varlink there are much fewer required for typical ops than there are in D-Bus. That's because D-Bus implies a broker (which doubles the number of roundtrips), but also because D-Bus forces you into a model of sending smaller "summary" messages when enumerating plus querying "details" for each listed objects, because it enforces transfer rate limits on everything (if you hit them, you are kicked off the bus), which means you have to refrain from streaming too large data.

Or in other words: marshalling is quite an irrelevant minor detail when it comes to performance, you must look at roundtrips instead and the context switches it effects, instead.

Using JSON for this has two major benefits: the whole world speaks JSON, and modern programming languages typically pretty natively. And it's directly readable in tools such as strace. With a simple "strace" I can now reasonably trace my programs, which a binary serialization will never allow you. And if you tell me that that doesn't matter, then you apparently live in an entirely different world than I do, because in mine debuggability does matter. A lot. Probably more than most other things.

Lennart

11 comments

The retrospective on the Xi editor project has a few notes on issues with JSON (at least in the context of Rust and Swift).

This is from someone who initially seemed to have a very similar perspective to you.

See the JSON section of the post: https://raphlinus.github.io/xi/2020/06/27/xi-retrospective.h...

To save others the click: Their issues were simply that Swift has no fast JSON impl, and in Rust, when using serde (most popular library handling JSON marshalling), it leads to binaries getting a bunch bigger. That's it. So yeah, same perspective -- unless either of the above matter in your case (in 90%+ of cases they don't), JSON is just fine from a perf perspective.
Serde is a rather chunky dependency, it's not just a matter of binaries getting bigger, but also compile times being dramatically slower.

IMO CBOR would be a better choice, you aren't limited to IEEE 754 floats for your numeric types. Yeah, some (de/en)coders can handle integer types, but many won't, it's strictly out of spec. I don't think building something as fundamental to an OS as relying on out-of-spec behavior is a great idea. It will result in confusion and many wasted hours sooner or later.

> CBOR would be a better choice, you aren't limited to IEEE 754 floats for your numeric types.

The other side of this coin, of course, is that now you have to support those other numeric types :) My usual languages of choice somehow don't support "negative integers in the range -2^64..-1 inclusive".

I mean, you don't have to support those? You still would need something on the other end to produce that type of datatype, which can be documented that it will never happen: you're making an interface anyways. The problem is if you literally don't have the option to represent common datatypes it will be a problem, not a hypothetical one just because the encoding layer can support it. Those are different problems.
And JSON, technically, allows use of unlimited-precision fractions, but also allows implementations to set arbitrary limits (it actually does, you're not required to parse JSON numbers as doubles). So the situation is not really different from CBOR, isn't it? Just™ make both sides to agree to stick to some common subset (e.g. integers-in-int64_t-range-only for some fields) and you're done; no need to support double-precision floating point numbers.
I read the slides, and I found it refreshing that you said at the end: don't create per-language bindings for the libraries shipped with systemd, but simply use a JSON parser for your language. That underlined that you've specified a simple protocol.

Also, there have clearly also been several attempts over the years to make a faster D-Bus implementation (kdbus, BUS1), which were never accepted into the kernel. It makes a lot of sense to instead design a simpler protocol.

There is clearly also a cautionary take about how microbenchmarks (here, for serialisation) can mask systemic flaws (lots of context switches with D-Bus, especially once polkit had to be involved).

The danger I see is that JSON has lots of edge behavior around deserialization, and some languages will deserialize a 100 digit number differently. If the main benefit is removing the broker and the need for rate limiting - it could have been accomplished without using JSON.
You are writing this as if JSON was a newly invented thing, and not a language that has become the lingua franca of the Internet when it comes to encoding structured data. Well understood, and universally handled, since 1997.

A 100 digit number cannot be encoded losslessly in D-Bus btw, nor in the far majority of IPC marshallings on this word.

Having done systems-level OS development since 25y or so I never felt the burning urge to send a 100 digit number over local IPC.

Not that 100 digit numbers aren't useful, even in IPC, but typically, that's a cryptography thing, and they generally use their own serializations anyway.

You are writing this as if security was a newly invented thing. Having done systems level security development for 12 years, anything that can be produced maliciously will be. By using JSON, you've invented a new vulnerability class for malicious deserialization attacks.

Actually, not new. Earliest CVE I found was from 2017, which feels a decade later than it should be. I guess no one thought of pushing JSON over trusted interfaces, and probably for good reason.

> A 100 digit number cannot be encoded losslessly in D-Bus btw

I think the concern is that large numbers can in fact be encoded in JSON, but there is no guarantee that they will be decoded correctly by a receiver as the format is underspecified. So you have to cater for the ill defined common denominator.

You should probably encode large numbers as strings.
The format is properly specified; its mapping onto actual hardware people use is not.
Honestly, the only thing that surprises me is you're being pedantic, and encoding int64s as strings.

I know you know JSON is nominally only 53-bit safe, because JS numbers are doubles. But in practice I'd wager most JSON libraries can handle 64-bit integers.

What if varlink supported both JSON and a subset of cbor with the "cbor data follows" tag at the beginning (so the server can determine if it is json or cbor based on the beginning of the message)?

It would add a little complexity to the server, but then clients can choose if they want to use a human readable format that has more available libraries or a binary format.

As for strace, tooling could probably be added to automatically decode cbor to json, either as part of strace, or in a wrapper.

There could also be a varlink proxy (similar to varlink bridge) that could log or otherwise capture requests in a human readable format.

But why though? Is this really a performance critical bus?
Yes. I run shared desktop login server clusters for students, DBUS is a bottle neck.
>With a simple "strace" I can now reasonably trace my programs, which a binary serialization will never allow you

Doesn't systemd use binary logging?

Not really. We use two text based formats for logging: BSD syslog, and systemd's structured logging (which is basically an env block, i.e. a key-value set, with some tweaks). Programs generate text logs, journald reads text logs hence. Programs that read from the journal get the text-based key/value stuff back, usually.

(Yes, we then store the structure log data on disk in a binary format. Lookup indexes are just nasty in text based formats).

Hence, not sure what the Journal has to do with Varlink, but any IPC that the journal does is text-based, and very nicely strace'able in fact, I do that all the time.

[Maybe, when trying to be a smartass, try to be "smart", and not just an "ass"?]

Sure the interface with the log might be text based, but my understanding is that the at rest format is binary and you need specialized tools to read it, standard unix grep is not going to cut it.

Although I use strace all the time, I hardly ever look at the payload of read and write calls, although I could see why it would be useful. But given a binary protocol it wouldn't be terribly hard to build a tool that parses the output of strace.

> [Maybe, when trying to be a smartass, try to be "smart", and not just an "ass"?]

thanks for the kind words and elevating the tone of the discussion.

> ...it wouldn't be terribly hard to build a tool that parses the output of strace.

Nah, it'd be a rage-inducing nightmare.

The marshalling cost might be negligible for come use cases, but the bandwidth usage definitely is not. I think the best interface description protocol is one where the serialization format is unspecified. Instead, the protocol describes how to specify the structure, exchange sequences, and pre/post-conditions. A separate document describes how to implement that specification with a certain over the wire format. That way the JSON folks can use JSON when they want (unless they are using large longs), and other folks can use what they want (I like CBOR).
> With a simple "strace" I can now reasonably trace my programs, which a binary serialization will never allow you.

Doesn't strace already come with desetializers for many common data structures?

is this true of future desktop uses cases where every basic function will cause a torrent of traffic on that? or you're talking from a server start/stopping services only point of view?
“ The marshalling cost for JSON is negligible”

I’ve worked with profiling code where the marshaling cost for JSON was the biggest cost. Namely it involved a heap allocation and copying a ton more data than was actually needed, and I ended up fixing it by turning the JSON into a static string and dropping the values in manually.

The systemd maintainers have probably done their due diligence and concluded that it isn’t an issue for their forseeable use cases, but it does lock everything in to doing string processing when interfacing with systemd, which is probably unnecessary. And you can’t trivially swap systemd out for something else.

systemd is so pervasive that it would be fine to add a binary-format-to-JSON translation ability into strace. That shifts the cost of debugging to the debug tools, rather than slowing down production code.

Doing any string processing tends to require a lot of branching, and branch mispredictions are most likely to slow down code. It also turns every 1-cyle load/store instruction into N-cycles.

String processing in C, which is what systemd and a lot of system tools are written, is pretty abysmal.

systemd is also non-optional, so if it turns out that it’s causing cache thrashing by dint of something generating a lot of small events, it’s not something you can do something about without digging into the details of your lowlevel system software or getting rid of systemd.

And it’s potentially just that much more waste on old or low-power hardware. Sure, it’s probably “negligible”, but the effort required to do anything more efficient is probably trivial compared to the aggregate cost.

And yeah, it may be better than D-Bus, but “it’s not as bad as the thing that it replaced” is pretty much the bare minimum expectation for such a change. I mean, if you’re swapping out things for something that’s even worse, what are you even doing?

I see there’s a TCP sidechannel, but why increase the complexity of the overall system by having two different channels when you could use one?

Dunno. This isn’t really an area that I work in, so I can’t say for sure it was the wrong decision, but the arguments I hear being made for it don’t seem great. For something fundamental like systemd, I’d expect it to use a serialization format that prioritizes being efficient and strongly-typed with minimal dependencies, rather than interoperability within the application layer with weakly-typed interpreted languages. This feels like a case of people choosing something they’re more personally familiar with than what’s actually optimal (and again, the reason I’d consider being optimal in this case being worth it is because this is a mandatory part of so many devices).

EDIT: Also, the reason that binary serialization is more efficient is because it’s simpler - for machines. JSON looks simpler to humans, but it’s actually a lot more complex under the hood, and for something fundamental having something simple tends to be better. Just because there’s an RFC out there that answers every question you could possibly have about JSON still doesn’t mean it’s as good as something for which the spec is much, much smaller.

JSON’s deceptive simplicity also results in people trying to handroll their own parsing or serialization, which then breaks in edge cases or doesn’t quite 100% follow the spec.

And Just because you’re using JSON doesn’t force C/++ developers to validate it, someone can still use an atoi() on an incoming string because “we only need one thing and it avoids pulling in an extra dependency for a proper json parser”, then breaks when a subsequent version of systemd changes the message. Etc. If the goal is to avoid memory safety issues in C/++, using more strings is not the answer.

But .. how can anyone use strace? That's not JSON. And serialization is cheap!
Do you know what strace is?

It's a command that prints the system calls (done via library). So you see write(55, "[1,2,3,4]\n")

Let me try again. You do see

  write(55, "[1,2,3,4]\n", 10)
and not

  {
    "syscall": "write",
    "params": {
      "fd": 55,
      "buf": "[1,2,3,4]\n",
      "len": 10
    }
  }
which would obviously be much better!

It can be parsed by any modern language, and deserialization is cheap.

Can we please rewrite strace with this in mind? Preferably in Rust.

How is the 2nd much better? Considering that a trace contains more than 1 single line?

Anyway they'd probably accept a patch for json output format. I don't think it's so difficult to do.

You can ask them in advance if they'd be willing to accept it, before starting to write it.

i honestly don't really get the angle of debugging via strace - i'd much rather prefer something more wireshark-like, where I can see all messages processes are sending to each other, since that would make it easier to decipher cases where sending a message to a service causes it to send other messages to its backends