Hacker News new | ask | show | jobs
by katet 2415 days ago
They are particularly horrendous..I've had the misfortune to work with government-provided PDFs using custom font glyphs in lieu of proper encodings. In some cases this was the only way to encode particular languages/scripts before Unicode (Jawi was my personal experience). There are now better ways, but poorly-exposed operating system support means most people with these needs still have custom fonts as the entrenched method of text entry.

Some of the encodings were so esoteric we resorted to OCR instead to extract the embedded text. It was quite frustrating to know that somebody - somewhere - knew what each octet represented, but it wasn't remotely Google-able (in English, at any rate).

(Tamil was also problematic, and still is, even with Unicode, as I understand it)