| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by cryptonector 1311 days ago

Yes, [sadly] deprecated by the Unicode Consortium in Unicode 5.2, and by the IETF in RFC 6082. Nor have they be undeprecated since them (see https://www.unicode.org/versions/Unicode15.0.0/ch05.pdf#G115...).

IMO we should have tried harder to make Unicode language tags useful and used. But it didn't happen, so they're a thing of the past. Of course, they're still there, and one could attempt to resurrect them, but most likely one would fail.

Choice quotes below:

https://www.rfc-editor.org/rfc/rfc6082

  > RFC 2482, "Language Tagging in Unicode Plain
  > Text" [RFC2482], describes a mechanism
  > for using special Unicode language tag
  > characters to identify languages when needed.
  > It is an idea whose time never quite came.
  > It has been superseded by whole-transaction
  > language identification such as the MIME
  > Content-language header [RFC3282] and more
  > general markup mechanisms such as those
  > provided by XML.  The Unicode Consortium
  > has deprecated the language tag character
  > facility and strongly recommends against
  > its use.  RFC 2482 has been moved to
  > Historic status to reduce the possibility
  > that Internet implementers would consider
  > that tagging system an appropriate mechanism
  > for identifying languages.
  >
  > A discussion of the status of the language tag
  > characters and their applicability appears
  > in Section 16.9 of The Unicode Standard
  > [Unicode52].

https://www.unicode.org/versions/Unicode5.2.0/ch16.pdf (section 9 of that chapter, 16)

  > 16.9 Deprecated Tag Characters 519 The Unicode
  > Standard, Version 5.2 Copyright © 1991–2009
  > Unicode, Inc.  for detailed recommendations
  > on the use of U+FFFD as replacement for
  > ill-formed sequences. See also Section 5.3,
  > Unknown and Missing Characters for related
  > topics.  16.9 Deprecated Tag Characters
  > Deprecated Tag Characters: U+E0000–U+E007F
  > The characters in this block provide a
  > mechanism for language tagging in Unicode
  > plain text. These characters are deprecated,
  > and should not be used—particularly with any
  > protocols that provide alternate means of
  > language tagging. The Unicode Standard recom-
  > mends the use of higher-level protocols, such as
  > HTML or XML, which provide for language tagging
  > via markup. See Unicode Technical Report #20,
  > “Unicode in XML and Other Markup Languages.”
  > The requirement for language information embedded
  > in plain text data is often overstated, and
  > markup or other rich text mechanisms constitute
  > best current practice. See Section 5.10,
  > Language Information in Plain Text for further
  > discussion.

(Reformatting is mine.)

1 comments

gnubison 1310 days ago

Oh, “tag” made my brain parse as HTML tags. TIL

link