Hacker News new | ask | show | jobs
by t0mas88 547 days ago
The type of protocol (message type, bitmap to define fields, followed by a set of fixed and variable length values) is pretty normal for the time it was developed in. Many low level things are basically packed C-structs with this type of protocol. It comes with some pitfalls on the receiver side to be careful validating dynamic field length and refusing to read past end of message or allocate an infinite buffer. But all of those are well understood by now.

What I find baffling is that this "standard" does not specify how to encode the fields or even the message type. Pick anything, binary, ASCII, BCD, EBCDIC. That doesn't work as a standard, every implementation could send you nearly any random set of bytes with no way for the receiver to make sense of them.

4 comments

> Many low level things are basically packed C-structs with this type of protocol.

Not really: C structs notably don't have variable-length fields, but ISO 8583 very much does.

To add insult to injury, it does not offer a standard way to determine field lengths. This means that in order to ignore a given field, you'll need to be able to parse it (at least at the highest level)!

Even ASN.1, not exactly the easiest format to deal with, is one step up from that (in a TLV structure, you can always skip unknown types by just skipping "length" bytes).

> Not really: C structs notably don't have variable-length fields

Feast your eyes: C99 introduced an ~~abomination~~ feature called flexible array members (FAM), which allows the last member of a struct to be a variable length array.

If you want to ~~gouge you eyes~~ learn more, see section 6.7.2.1.16 [0].

> To add insult to injury, it does not offer a standard way to determine field lengths

That's awful. You can sort of say the same about variable length struts in C, but at least the strict tupe definition usually has a field that tell you the length of the variable length array at the end.

[0] https://rgambord.github.io/c99-doc/sections/6/7/2/1/index.ht...

> [...] feature called flexible array members (FAM), which allows the last member of a struct to be a variable length array.

Oh, ISO 8583 has these too!

Sometimes they're even combined with the "feature" described in the article where there's a variable number of fixed-length elements, except for the last element, which is a variable-length string (or sometimes the last field type repeated n times). That's always "fun" to work with.

ISO 8583 really is a living museum of all ideas people had about binary encoding in the last half century or so.

FAM was a (not so successful) attempt to standardise some existing usage
As far as I'm concerned, we solved binary formats with ASN.1 and its various encodings. Everything afterwards has been NIH, ignorance, and square wheel reinvention.
ASN.1 DER, BER, or OER? Implicit and optional can really break compat in surprising ways. Then there are the machine unfriendly roster of available types. XDR was more tuned for that.

Finally free tooling doesn't really exist. The connection to the OSI model also didn't help.

> ASN.1 DER, BER, or OER?

Or XER or JER! One of the brilliant things about ASN.1 is that it decouples the data model from the serialization format. Of the major successor systems, only protobuf does something similar, and the text proto format barely counts.

> Implicit and optional can really break compat in surprising ways

Any implementation of any spec can be broken. You could argue that the spec should be simpler and require, e.g., explicit tagging everywhere, like protobuf. Sure. But the complexity enables efficiencies, and it's sometimes worth making a foundational library a bit more complex to enable simplifications and optimizations throughout the ecosystem.

> Then there are the machine unfriendly roster of available type

Protobuf's variable-length integers are machine-friendly now? :-) We can always come up with better encoding rules without changing the fundamental data structures.

> Finally free tooling doesn't really exist.

What do you mean? You use ASN.1 to talk to every server talking SNMP, LDAP, or the X.509 PKI. Every programming environment has a way to talk about ASN.1.

> The connection to the OSI model also didn't help.

Agreed. The legacy string types aren't great either. You can, of course, do ASN.1 better. No technology is perfect. But what we don't need, IMHO, is more investment in "simple" technologies like varlink that end up being too simple and shunting complexity and schema-ness that belongs in a universal foundation layer into every single application using the new "simple" thing.

> ASN.1 DER, BER, or OER? Or XER or JER!

My opinion is that DER is better. (However, DER is a restricted form of BER; any DER file is also a valid BER file, but has certain requirements of the encoding, so that it is a canonical form (the other canonical form is CER, but my opinion is DER is better).)

> Every programming environment has a way to talk about ASN.1.

Not all implementations are well-designed, though; I can see many implementations of ASN.1 that are not well-designed. I made up my own, to hope to be better, but we will see what is (hopefully) better.

> But what we don't need, IMHO, is more investment in "simple" technologies like varlink that end up being too simple

I agree with this, and it is important. This is my intention when I was designing stuff, to not be too simple nor too complicated; most stuff I see tends to be either too complicated or too simple, so I try to make stuff better than that.

There are zero free ASN.1 compilers or module checkers.
> There are zero free ASN.1 compilers or module checkers.

I must be misunderstanding what you're saying, because this exists: <https://www.erlang.org/doc/apps/asn1/asn1ct.html#>

From the linked page:

> asn1ct

> ASN.1 compiler and compile-time support functions

> The ASN.1 compiler takes an ASN.1 module as input and generates a corresponding Erlang module, which can encode and decode the specified data types. Alternatively, the compiler takes a specification module specifying all input modules, and generates a module with encode/decode functions.

XML also decouples the data model and serialization with the XML Infoset specification.
I think ASN.1 is good but there are some problems with it. I think that it should not need separate type numbers for the different ASCII-based string types and separate type numbers for the different ISO-2022-based string types; you can use one number for ASCII and one number for ISO-2022; the restrictions will be a part of the schema and should not be a part of the BER/DER. Furthermore, I think they have too many date/time types. Also, some details of the other types (e.g. the real numbers type) are more messy than they should be if they are designed better.

I had made up the "ASN.1X", which includes some additional types such as: key/value list, TRON string, PC string, BCD string, Morse string, reference, out-of-band; and deprecates some types (such as OID-IRI and some of the date/time types; the many different ASCII-based and ISO-2022-based types are kept because a schema might have different uses for them in a SEQUENCE OF or SET OF or a structure with optional fields (even though, if I was designing it from the start, I would have not had many different types like that)), and adds a few further restrictions (e.g. it must be possible to determine the presence or absence of optional fields without looking ahead), as well as some schema types (e.g. OBJECT IDENTIFIER RELATIVE TO). (X.509 does not violate these restrictions, as far as I know.)

I also have idea relating to a new OID arc that will not require registration (there are already some, but this idea has some differences in its working including a better structure with the working of OID); I can make (and had partially made) the document of the initial proposal of how it could work, but it should need to be managed by ITU or ISO. (These are based on timestamps and various kind of other identifiers, that may already be registered at a specific time, even if they are not permanent the OIDs will be permanent due to the timestamps. It also includes some features such as automatic delegation for some types.)

There are different serializations formats of ASN.1 data; I think DER is best and that CER, JER, etc are no good. I also invented a text-based format, which can be converted to DER (it is not really meant to be used in other programs, since it is more complicated than parsing DER, so using a separate program to convert to DER will be better in order to avoid adding such a complexity into programs that do not need them), and I wrote a program that implements that.

A bitmap to define field presence doesn’t seem so offensive, as far as serialization formats go. FlatBuffers[1] use a list of offsets instead, but that’s in the context of expecting to see many identically-shaped records. One could argue that Cap’n Proto with zero-packing[2] amounts to the same thing if you squint, just with the bitmap smeared all over the message.

I mean, this specific thing sounds like it should have been a fatter but much more unexciting TLV affair instead. But given it’s from 1987, it’d probably have ended up as ASN.1 BER in that case (ETA: ah, and for extensions, it mostly did, what fun), instead of a simpler design like Protobufs or MessagePack/CBOR, so maybe the bitmaps are a blessing in disguise.

[1] https://google.github.io/flatbuffers/flatbuffers_internals.h...

[2] https://capnproto.org/encoding.html#packing

I'd trade the field layer of ISO 8583 for some ASN.1 any day!

Luckily, there's a bit of everything in the archeological site that is ISO 8583, and field 55 (where EMV chip data goes, and EMV itself is quite ASN.1-heavy, presumably for historical reasons) and some others in fact contain something very close to it :)

Telegram's "TL" serialization, that's part of its network protocol, also uses a bitmap for optional fields. It's an okay protocol overall. The only problem is that the official documentation[1] was written by Nikolay Durov who is a mathematician. He just loves to overgeneralize everything to a ridiculous degree and spend two screens worth of outstandingly obtuse text to say what amounts to, for example, "some types have type IDs before them and some don't because the type is obvious form the schema".

[1] https://core.telegram.org/mtproto

Something similar is TLV which is extremely common in binary network protocols, because it's very flexible for compatibility.