| I know of at least one problem with ASN.1. The string encodings other than UTF-8 are terrible. Most of the string encodings are very limited and weird subsets of ASCII that nobody actually uses anymore. ASN.1 itself doesn't define the encodings and just refers to other standards. The problem with this is probably most notable with the T.61 encoding which changed over the years and since ASN.1 references other standards nobody is quite sure exactly what you have to support to have T.61 actually work right. Within X.509 certificates though nobody bothers to actually implement T.61 and just uses the T.61 flag for ISO-8859-1. There are a bunch of gory details around this mess in this (now quite old) write-up here:
https://www.cs.auckland.ac.nz/~pgut001/pubs/x509guide.txt Since that write up I believe UTF-8 is pretty much the expectation for character encoding for X.509. I documented some of the quirks around 6 years ago when I took an existing X.509 parser and improved it for use in certificate trust management in Subversion: http://svn.apache.org/viewvc/subversion/trunk/subversion/lib... Basically ASN.1 wasn't well defined and it only works well when people agreed to only use certain features or to interpret things in a particular way when ambiguous. It's also notoriously difficult to parse well. It's very easy to have bugs in your parser, even if you're implementing a subset of it that's needed for X.509. Especially if you're doing so in a non-memory safe language. I can't speak for why Google invented Protobufs, but I can't imagine anyone sane picking up ASN.1 for anything modern and deciding that this is what they want to use. |
PKIX actually leverages the weird encoding restriction to our benefit. It defines two kinds of names which things might have on the Internet (you can and should stop trying to name things which are actually on the Internet some other way), DnsNames and IpAddresses. IpAddresses, since they're either 32-bit or 128-bit arbitrary bit values, are just represented as either 32-bit or 128-bit arbitrary bit values. So you cannot express the erroneous IPv4 address 100.200.300.400 as an IpAddress, which means you can't trip up somebody's parser with that nonsense address. DnsNames use a deliberately sub-ASCII encoding from ASN.1 which can express all the legal DNS names (all A-labels and the ASCII dot . are permissible) but can't express lots of other goofy things including most Unicode. So a certificate issuer, even if they're completely incompetent, cannot write a valid DnsName that expresses some garbage IDN as Unicode. Hopefully they read the documentation and find out they need to use A-labels (Punycode) but if not they're prevented from emitting some ambiguous gibberish.
Even in forums where you'd once have expected pushback, "Just use UTF-8" is becoming more widespread. Microsoft for example, once upon a time you'd get at least some token resistance, today they're likely to agree "Just use UTF-8". So ASN.1 ends up no worse off for a half a dozen bad ways to write text you shouldn't use, compared to say XML, HTML, and so on.