Hacker News new | ask | show | jobs
by Arnt 2638 days ago
The question doesn't really make sense, because that's not what unicode is.

Unicode encodes what's necessary for printing books since about 1900 (and a bit more, but that's a fair one-sentence summary). What you want to validate isn't that you'd be able to print every kind of book printed since 1900. You're only interested in some of the alphabets, and you may be interested in more functionality than just printing. For example you may need sorting, or character input with the right sort of interactive appearance changes, or equality testing.

If you decide what you want to work, then googling usually finds a suitable test quickly.

1 comments

Right, but if you're making a word processor or a web forum or a registration form what you want might be "Well, I don't speak languages that need complex scripts, but I'd be happy to support other people's scripts if it's easy"
The easiest test for "does my software handle Unicode somewhat better than dumbly" is emoji. If your users aren't already deluging you with emoji in their content in 2019, grab the emoji keyboard from your Operating System, often easy to find on most "soft keyboard overlays" such as mobile platforms. (In Windows 10 for the last year or so there are two keyboard shortcuts that work everywhere: Windows Key+. and Windows Key+;)

Many emoji these days are quite complex Unicode sequences with a number of them in the so-called "Astral Plane" meaning they need more than 16-bits to accurately display (proving you aren't treating UTF-8 or UTF-16 as if it was UCS-2), and as sequences include a lot of fun non-visible codepoints ("characters") such as the Zero-Width Joiner, and are very susceptible to breaking if accidentally dropped, reordered, or otherwise spliced (possibly proving you aren't doing back string math or manipulation at the codepoint level rather than the glyph/sequence/combined-character level).

[ETA: Useful sequences to test are any that support the skin-tone and gender modifiers. On Windows, the various "cat occupation" emoji are also interesting sequences such as ninja cat and astro cat. Other platforms have similar unique "fun" sequences that are noticeable at a glance when right/wrong.]

It's not entirely true that if you support emoji well you support any Unicode user's script well, but if you support emoji well you probably don't do anything particularly stupid to make other Unicode users unhappy.

Indeed. And if you're doing other things your test is a different one. It depends.

BTW, if you want to discuss which languages's scripts are complex… office, office, office, office.