| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by doublement 2374 days ago
	This is going to sound sarcastic but it's not: Can we get back to just putting the members of C structures into network byte order and sending that over the wire in binary, à la 1995?

13 comments

btown 2374 days ago

This is more or less what https://capnproto.org/ does.

link

davedx 2374 days ago

I think this is also what Rust's bincode crate does. It's very sane and you can just open up files in a hex editor and see what's there.

https://crates.io/crates/bincode

link

omginternets 2374 days ago

From the capnproto docs:

>Isn’t this all horribly insecure?

>No no no! To be clear, we’re NOT just casting a buffer pointer to a struct pointer and calling it a day.

Isn't this a direct contradiction to your claim? Or have I misunderstood them?

link

jtolmar 2374 days ago

IIRC: capnproto generates messages that you could deserialize by casting them to the right struct, but refrains from actually doing it that way. Instead it generates a bunch of accessor methods that parse the data, as if you were reading something that's not basically a c-struct, like a protobuff.

link

kentonv 2374 days ago

That's basically correct. Cap'n Proto generates classes with inline accessor methods that do roughly the same pointer arithmetic that the compiler would generate for struct access.

There's a couple subtle differences:

* The struct is allowed to be shorter than expected, in which case fields past the end are assumed to have their schema-defined default values. This is what allows you to add new fields over time while remaining forwards- and backwards-compatible.

* Pointers are in a non-native format. They are offset-based (rather than absolute) and contain some extra type information (such as the size of the target, needed for the previous point). Following a pointer requires validating it for security.

(Disclosure: I'm the author of Cap'n Proto.)

link

asveikau 2374 days ago

Re-read the comment I think. It doesn't say casting a struct pointer. It says putting the members of the struct into network byte order over the wire. I read that as individually serializing each member in a portable, safe way.

Anyway even if you do choose the struct pointer hack (which I do not see advocated here) it can be done relatively well albeit requiring language extensions and a bit of care. Pragmas and attributes to ensure zero padding and alignment between members. No pointer members. Checking sizes and offsets after a read (the hardest part).

link

Animats 2374 days ago

"As of this writing, Cap’n Proto has not undergone a security review, therefore we suggest caution when handling messages from untrusted sources."

Something like that has to be rigorously tested or proven to be free of buffer overflows. It's so easy to attack with malformed messages. Parsers for remote messages are a classic source of vulnerabilities. It's hard to test this, because it's a code generator.

This looks promising as an attack vector for a big system built on microservices. If you can find an exploit in this that lets you overwrite memory, and can break into some service of a set of microservices by other means, you can leverage that into a break-in of other services that thought their input was a trusted source.

The "zero overhead" claim goes away as soon as you send variable length items. Then there has to be some marshaling.

link

kentonv 2374 days ago

> As of this writing, Cap’n Proto has not undergone a security review

This is outdated, I should remove it. Cap'n Proto has been reviewed by multiple security experts, though not in a strictly formal setting. I trust it enough to rely on it for security in my own projects, but yeah, I am cautious about making promises to others...

> Something like that has to be rigorously tested or proven to be free of buffer overflows.

I've done a bunch of fuzz testing with AFL and by hand. I've also employed static analysis via template metaprogramming to catch some bugs. See:

https://capnproto.org/news/2015-03-02-security-advisory-and-...

(That was... almost five years ago.)

> The "zero overhead" claim goes away as soon as you send variable length items. Then there has to be some marshaling.

Space for messages is allocated in large blocks. The contents of the message are allocated sequentially in that space and constructed in-place. So once built, the message is already composed of a small number of contiguous memory segments (usually, one segment), which can then be written out easily. Or, if you're mmaping a file, you can have the blocks point directly into the memory-mapped space and avoid copying at all -- hence, zero-copy.

So no, there is no marshaling.

link

lidHanteyk 2374 days ago

Capn is better than C at struct layout. We are not, under any circumstances, going back to the 90s. We are moving forward and learning from mistakes.

link

salgernon 2374 days ago

I would like to submit apples archaic “Rez”[1] as a great language for declaring binary formats. It was designed to be able to describe c and pascal structures.

[1] http://preserve.mactech.com/articles/mactech/Vol.14/14.09/Re...

link

sagarm 2373 days ago

The wire encoding for protos is much more compact than the in-memory representation, especially for sparsely populated messages (very common especially in mature systems).

You'd still have to figure out some way to serialize nested messages. Note that you can have recursive message definitions.

link

rbanffy 2374 days ago

> Can we get back to just putting the members of C structures into network byte order and sending that over the wire in binary, à la 1995?

I hope not (I know you are being sarcastic). We should use something that's trivial to implement correctly, as well as easy to read and to debug.

link

mikece 2374 days ago

Is that less of a configuration mess than WCF was? JSON isn't "The Magical Elixir" of data exchange and I'm more than open to something better but at least we (in the .NET community) have moved past the WCF configuration nightmares.

link

bob1029 2374 days ago

WCF is an unmitigated dumpster fire. We have actually written a non-WCF client that uses a raw HttpClient implementation with StringBuilder to compose SOAP envelopes around cached XMLSerializers in order to talk to other WCF services. First request delay went from 1-2 seconds down to a few milliseconds. Memory overhead is negligible now. Prior, you could watch task manager and immediately recognize when WCF is "warming up". Additionally, the XML serializer in .NET seems almost pathologically determined to ruin everything you seek to accomplish.

By comparison, JSON contracts are an absolute joy to work with. We still practice strong-typing on both sides of the wire (we control both ends), and have pretty much nothing to complain about. If you are concerned with space overhead w/ JSON, simply passing it through gzip can get you down to a very reasonable place for 99% of use cases. I understand that there are arguments to be made against JSON for extremely performance sensitive applications, but I would counter-argue that these are extremely rare in practice.

link

heavenlyblue 2372 days ago

Isn’t the problem with WCF rather than XML?

link

innagadadavida 2374 days ago

Protobufs have a pretty nice variable encoding integer wire format. This gives you the flexibility of saving space without doing compression.

While zero copy is nice, you cannot make it work when using compression.

link

rapsey 2374 days ago

That variable integer encoding is however slow for encoding/decoding. The space savings are of questionable worth.

link

seriesf 2374 days ago

They are the same size as UTF-8 numbers but much slower to decode. I think the more-bit format is the only glaring mistake in proto that can never be fixed.

link

pantalaimon 2374 days ago

This gets hairy the moment you want to add new fields.

link

CoolGuySteve 2374 days ago

Both protobuf and plain C structs are append only formats if you put the message-type and size at the start of the C-struct.

link

marcan_42 2374 days ago

C structs do not compose extensively. Protobufs do. You can't put variable length data into a struct, and hence you can't put extensible structs into it either.

link

doublement 2374 days ago

You can put variable length data into a struct:

https://en.wikipedia.org/wiki/Flexible_array_member

link

xyzzyz 2374 days ago

Only one field can be variable length, and it must be last. I'll pass.

link

CoolGuySteve 2374 days ago

You definitely can but it's not as obvious, make a separate message type for list elements and append them on the wire. If you only have one list at the tail, you can use a flexible array[] at the end but it's finicky to deal with if you need more than one.

You can build large hierarchical structures of messages with lists contained therein. It's pretty much how .mov/.mp4/many, many media container formats work. The technique dates back to the Amiga days.

link

doublement 2374 days ago

Yeah I know that pain, there needs to be a consistent header with a version in all the messages.

link

caffeine 2374 days ago

https://github.com/real-logic/simple-binary-encoding is a good way to do this

link

izacus 2374 days ago

This is practically exactly what Protobuffers are. Except that they actually are defined clearly enough for multiple services written in multiple languages can work with them.

link

CoolGuySteve 2374 days ago

Definitely not, protobuf's strange wire format becomes apparent if you ever look at the hexdump of one or the profiler output of your favourite protobuffer-decoding C/C++ application.

They're actually kind of performance heavy for no benefit.

link

imtringued 2374 days ago

I have once looked at a benchmark that compared protobuffer, message pack, json and a variety of other serialization formats. In terms of reducing bytes per message gzipped json was ahead of all of them at the cost of increased CPU time for gzip. Protobuffer did pretty poorly, the only benefit was decreased CPU usage. I'm sure you could use some other compression algorithm like LZMA to get both good compression and good performance for JSON messages.

link

kentonv 2374 days ago

> In terms of reducing bytes per message gzipped json was ahead of all of them

Try gzipping the protobuf. Binary encoding and compression are different things which can be stacked. Gzipped protobuf should be smaller and faster than gzipped json in basically all cases.

link

CoolGuySteve 2374 days ago

I use LZ4 (with "best" compression) for packet captures and replay with great results.

I get about a 37% compression ratio with extremely fast decoding, like 10 million packets per second off an SSD.

It was better than snappy, gzip, and bz2 for the trade-off of compression time, decompression time and file size.

As for protobuf: flatbuffers, capn proto, HDF5, and plain C structs all deliver much, much faster decoding time. It's really not the best answer for any serialization at this point but it's still inexplicably popular.

link

touisteur 2374 days ago

Where you thinking of https://www.lucidchart.com/techblog/2019/12/06/json-compress... ?

link

cma 2374 days ago

I thought that's what capt'n proto was, not protobuffers.

link

grandmczeb 2374 days ago

> This is practically exactly what Protobuffers are.

Not really. Encoding/decoding protobufs is straightforward, but not nearly as simple as you’re suggesting.

link

izacus 2374 days ago

Sure, but anything you're trying to transport between languages which don't even agree on endianess will end up like this.

Dumping a struct on a wire is just a wishful dream that turns into a nightmare as soon as you need to send that to a service written in another language or running on another architecture.

Don't get me wrong - there's plenty of insanity in protobufs. But trying to cover the same use-case will not create a simple protocol.

link

grandmczeb 2374 days ago

Cap'n proto didn't "end up like this" and works across languages.

link

heavenlyblue 2372 days ago

Cap’n’proto isn’t well supported apart from C or Rust.

Python library is an absolute nightmare. Their tests used to catch Exception, and what they ended up testing was basically whether their test try to access nonexistant attributes.

The issue is that capnproto is relatively more complex, and as such is harder to implement well.

link

ping_pong 2374 days ago

XDR and ONC/RPC for the win!

link

nabla9 2374 days ago

The memory layout of a C struct is ABI and compiler dependent.

Some compilers conform to same ABI in same system or similar system and work almost exactly the same, so you may grow old thinking that's how it is until it's too late. I think gcc, clang and Intel work almost the same in Linux and OSX.

link

doublement 2374 days ago

Indeed, that's why I specified putting the members of the C structure on the wire, not the structure as a whole, so it's just basic types in network byte order (i.e. consistent endian-ness) being sent.

link

itronitron 2374 days ago

I've worked on an application where that was the standard data transfer scheme, and then while working with protobuf on another project felt that after looking under protobuf's covers it was doing something very similar but wrapping an entire API around it.

link

CoolGuySteve 2374 days ago

No, not really. #pragma pack and/or __attribute__((packed)) have been supported for eons now and guarantee the alignment of struct members between compilers.

In newer C++ specs, you can also static assert that the struct is a POD type to statically ensure that there's no accidental vtable pointer.

This argument pops up every time someone mentions this and every time it's completely uninformed.

link

cyphar 2374 days ago

Though it should be noted that packed structures cause compilers to produce absolutely garbage code when accessing them (because most of the accesses become unaligned) and it becomes incredibly memory-unsafe (as in "your program may crash or corrupt memory") to take pointers of fields inside the struct because they are (usually) presumed to be aligned by the compiler.

Explicit alignment doesn't suffer from this problem nearly as badly (yeah, you might have to add some padding but that's hardly the end of the world -- and if you have explicit padding fields you can reuse them in the future).

link

amacbride 2374 days ago

XDR FTW!

link

BubRoss 2374 days ago

Why even put them in network byte order? Every modern system is little endian, if you standardize on that, only exotic systems would have to deserialize anything.

link

syncsynchalt 2374 days ago

If you force the most common system to translate byte order, then you'll have some confidence that your code is performing the translation correctly. If instead you rely on hoping that everyone added the correct no-op translation calls everywhere, you'll find your code doesn't work as soon as you port it to another CPU.

This is a nice side effect of network byte order being the opposite of the dominant cpu order, though obviously it was never intended.

link

doublement 2374 days ago

Because when someone builds a hugely popular exotic system in the future, because it is one (1) cent cheaper, you'd end up with code that has to check to see if it's running on such a system.

link

BubRoss 2374 days ago

This doesn't make any sense for multiple reasons, but especially because you wouldn't be checking anything in the first place. A big endian system would would reorder bytes and a little endian system would just use it directly from memory without another copy or reordering anything.

link

toast0 2374 days ago

There's not a library pattern for host to little endian, or little endian to host, like we have with hton and ntoh. Which makes it more likely to be messed up.

link

bserfaty 2374 days ago

Ha - I just had this exact argument yesterday. Why indeed?

link