Hacker News new | ask | show | jobs
by euske 1689 days ago
This is the reason why Adobe PDF isn't relying on Unicode. Adobe products has a huge presence in Japan since 90s and they had to appeal to the printing industry, which is very anal to this kind of issues. So they ended up using a separate encoding for every language. Today, CJK letters in PDF are encoded in Adobe-GB1 (mainland China), Adobe-CNS1 (Hong Kong), Adobe-Japan1 and Adobe-Korea1 respectively. Not the cleanest way, but it gets the job done.
3 comments

Thanks for the pointer, that's pretty interesting.

Looking at their doc [0] it seems they used their Adobe-Japan1 to wrap a much more wider set of characters than any single encoding standard, including ligatures, vintage encodings etc.

It seems to be a pretty big work and kinda fits with the image of PDF handling being such a monumental beast.

[0] https://github.com/adobe-type-tools/Adobe-Japan1/

Note that they are now adopted by the Unicode Ideographic Variation Database [1] among other variation databases.

[1] https://unicode.org/ivd/

Adobe gets lots of stick for its subscription and malware like Creative Cloud. But they do spend huge amount of resources on CJK fonts, layout and encoding.

And part of the reason why I like PDF.

( Behind a Paywall ) https://ken-lunde.medium.com/my-28-years-of-adobelife-e97e70...