Hacker News new | ask | show | jobs
by bri3d 644 days ago
Interesting way to approach this (dictionary based compression over JSON and Erlang ETF) vs. moving to a schema-based system like Cap'n Proto or Protobufs where the repeated keys and enumeration values would be encoded in the schema explicitly.

Also would be interested in benchmarks between Zstandard vs. LZ4 for this use case - for a very different use case (streaming overlay/HUD data for drones), I ended up using LZ4 with dictionaries produced by the Zstd dictionary tool. LZ4 produced similar compression at substantially higher speed, at least on the old ARM-with-NEON processor I was targeting.

I guess it's not totally wild but it's a bit surprising that common bootstrapping responses (READY) were 2+MB, as well.

1 comments

They use JSON over the wire and not a binary protocol? That's madness and reminds me of XML / Jabber.

Protos or a custom wire protocol would be far better suited to the task.

Hard disagree given the constraints. Every bot is also consuming the Discord API and forcing 3rd-party devs, many whom aren't particularly advanced coders to suddenly deal a binary wire format would be painful especially if you needed to constantly update a proto file. Their API is also part WebSocket part HTTP and many methods doing double-duty.
To be fair, this is exactly what the Accept and Content-Type standard HTTP headers are for. Clients can tell the API "OK, send me application/json data instead of binary data" or vice versa. You can have the majority of your traffic (client traffic) using the binary format, and still support JSON for bot API usage. This is standardized for both WebSockets and HTTP.
Is there a way to do this that doesn't require keeping two sets of books for every API? Because the JSON API is right now the canonical one and still has to work. I don't imagine the lift is worth it for the difference between compressed JSON and BSON.

Also how much can you realistically win when the payload for small messages where the difference matters are text?

Modern serialization libraries make supporting multiple serialization formats pretty transparent - of course, I'm not sure what the current situation is in Elixir land, which Discord seem to be using, but Go and Rust (as trivial examples) have serialization libraries which make serialization based on content negotiation pretty much transparent. Of course, this doesn't help with testing, you'll still need to be testing both content types separately, but the savings in bandwidth might just be worth it.
Moreover, i imagine a lot of these bots are built on top of an SDK instead of directly working with API calls, so would be just a matter of changing the SDK internals.
Websockets don't support headers per spec, discord gets around this by using a query paramater though.
The initial handshake request is still over regular HTTP, though, which is where I'd assume you'd want to agree upon which content type you'll be sending anyways.
Some RIFF like format would not be that hard to parse different sections of. You get to ignore the parts you dont recognise and decode the parts you do.

Moving to a binary format would be better for 99.9% of users and would be a slight inconvenience to a few people creating bots. Discord could easily publish a library for reading the format if needed.

I don't understand why so many protocols that expect to handle large amounts of data don't default to a binary schema. JSON is fine on the edges, but the wire format between nodes is not the edge.
I assume it’s mostly because it’s way easier to debug json over websockets and http with browser devtools instead of custom protocols.

Custom protocol would be binary.

They could make a custom extension but it wouldn’t be that easy.

I worked on browser devtools for IE and edge.

Even chrome/vscode use jsonrpc over websockets for ease of development.

Wouldn’t the ETF (Erlang Term Format) suffice in this case?

IIRC it’s used in the desktop client and some community libraries (specifically JDA) have support for it.

There were some quirks regarding ETF usage with Discord’s Gateway but I can’t recall at the moment.

Erlang terms are, to a first approximation, the same as JSON for most relevant communication metrics.

To a second approximation it gets more complicated. Atoms can save you some repetition of what would otherwise be strings, because they are effectively interned strings and get passed as tagged integers under the hood, but it's fairly redundant to what compression would get you anyhow, and erlang term representations of things like dictionaries can be quite rough:

    3> dict:from_list([{a, 1}, {b, 2}]).
    {dict,2,16,16,8,80,48,
          {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
          {{[],
            [[a|1]],
            [[b|2]],
            [],[],[],[],[],[],[],[],[],[],[],[],[]}}}
Even with compression that's a loss compared to '{"a":1,"b":2}'.

Plus, even if you're stuck to JSON one of the first things you do is shorten all your keys to the point they're as efficient as tagged integers on the wire anyhow ("double-quote x double-quote" can even beat an integer, that's only 3 bytes). Doesn't take a genius to note that "the_message_after_it_has_been_formatted_by_the_markdown_processor" can be tightened up a bit if bandwidth is tight.

It isn't clearly a loss over JSON but it is certainly not a clear win either. If you're converting from naive Erlang terms to some communication protocol encoding you're already paying for a total conversion and you might as well choose from the full set of option at that point.

Sorry I want to apologize, I made an error in my initial statement and meant Erlang _External_ Term Format instead of Erlang Term Format.

Does this change anything w.r.t your response?