Hacker News new | ask | show | jobs
by jeroenhd 1419 days ago
Looking at the source code, this seems to work by generating dedicated parser code for a yiven definition which will copy values in a certain order through a flat copy.

I'm seeing little specifications or conversions regarding endianness so I'm guessing that's out of scope for this project. It seems almost completely backwards incompatible and I'm not too sure about their security validations. I don't think this and Flatbuffers are competing in the same space, really.

I definitely believe this is fast, it's as close to a memcpy to a network packet as you can get. I'd be wary to use this on external data in any native language without any kind of fuzzing first.

That said, I do like the way the generators work.

1 comments

Generating code per message might not be the right choice anyway: if you have a lot of messages, a table driven approach can save you a lot of code size. Optimizing for speed in microbenchmarks can lead one to pessimize overall program architecture in ways that are hard to undo later.
It depends on your application, but if it's just generated code then I don't really see the problem with code size. As long as it's easy to (de)serialize data or add a nice big facade between the generated code and business logic, the generated backing code can be a complete mess of spaghetti code for all I care.
Size of generated code matters when the target is WASM to run in browsers.
But only if you have a lot of message formats...

Generally, message specifications are written by hand, so even a big project may only have a couple of hundred. Doesn't sound so bad.

Also, presumably, if code size really is a big concern, you can decode this in more code efficient ways too, as long as you are less concerned with performance.

You might be surprised to see how quickly mechanically-generated marshaling code adds up. Look at Android's frameworks.jar one day and see how much is just the same AIDL-generated Parcel-manipulation code over and over again. Cache effects and IO costs are so severe in modern computing that my default (but rebuttable!) position is to prefer table-driven approaches over code generation wherever possible.

Windows COM switched from codegen to table-driven "stubless" marshaling decades ago at no noticeable cost to performance and with a huge code size win.