Hacker News new | ask | show | jobs
by kevin_thibedeau 1619 days ago
10 different string encodings is one problem.
3 comments

Is it ? You pick the one that fits your use, normally UTF8String these days
Can one use UTF8?

The 90s were rough on text encoding, but it seems pretty settled now.

> Can one use UTF8?

For new standards, yes. But ASN.1 was first specified in the '80s, and backwards compatibility is a thing. So really it depends on what you're doing: if you can start with a subset of ASN.1, which I think is done in MDER[0] and OER[1], you have a bit more freedom. But if you're working in legacy formats and standards that operate internationally, you could run into problems.

[0]: https://www.iso.org/standard/66717.html

[1]: see among others https://www.ntcip.org/document-numbers-and-status/

Kerberos implementations generally just-send-whatever in IA5String fields. That means Windows sends UTF-8, and MIT Kerberos and Heimdal send whatever the user's locale uses. Windows doesn't normalize or anything. It works in that a) it interops when using ASCII names, b) it interops when using non-ASCII names in UTF-8 locales on Unix. It violates the spec, but it works.
Stick to UTF8String. ASN.1 predates Unicode.
Or IA5SString if you know, ahead of time, that you only need ASCII.
Or do what many implementors do: just send whatever you have as whatever string type the protocol spec requires.