Hacker News new | ask | show | jobs
by oshiar53-0 1576 days ago
Fun fact: GB 18030 is a Unicode Transformation Format.

Example: \N{THINKING FACE}\N{FACE WITH TEARS OF JOY}\N{FACE SCREAMING IN FEAR}\N{SMILING FACE WITH SMILING EYES AND THREE HEARTS}\N{PERSON DOING CARTWHEEL}\N{FACE WITH NO GOOD GESTURE}\N{ZERO WIDTH JOINER}\N{FEMALE SIGN}\N{VARIATION SELECTOR-16}\N{EYES}\N{ON WITH EXCLAMATION MARK WITH LEFT RIGHT ARROW ABOVE}\N{SQUARED COOL}\N{VARIATION SELECTOR-16}

In UTF-8:

  00000000: f09f a494 f09f 9882 f09f 98b1 f09f a5b0  ................
  00000010: f09f a4b8 f09f 9985 e280 8de2 9980 efb8  ................
  00000020: 8ff0 9f91 80f0 9f94 9bf0 9f86 92ef b88f  ................
In GB 18030:

  00000000: 9530 cd34 9439 fc38 9530 8335 9530 d636  .0.4.9.8.0.5.0.6
  00000010: 9530 d130 9530 8535 8136 a439 a1e2 8431  .0.0.0.5.6.9...1
  00000020: 8235 9439 cf38 9439 e537 9439 8b32 8431  .5.9.8.9.7.9.2.1
  00000030: 8235                                     .5
2 comments

Which is carefully designed to work around existing codes that only expect at-most-two-byte-long encoding, e.g. Windows's IsDBCSLeadByte(Ex). Normally a bad design for a new-ish encoding, but a reasonable one given that it's meant to be a superset of GBK---an already bad but widespread encoding.
What is a Unicode transformation format?
It's an encoding that encodes all of Unicode. The "UTF" in UTF-8, etc. stands for Unicode Transformation Format.