Hacker News new | ask | show | jobs
by derefr 2093 days ago
> But the internet leaves no room for someone with a pencil.

I've never personally seen it done, but couldn't you do this with the Unicode Private-Use Area? Assuming you have your own blog where you control the CSS, you could sprinkle in a few private-use codepoints, and then add a custom web font to the CSS font fallback that defines glyphs for those same codepoints.

(I know putting a document with Private-Use codepoints up on the public web would go against the philosophy of Unicode; and that the correct thing to do here would be to email the Unicode Consortium about the need for your character. [They'll probably agree!] But, ignoring the philosophy, in practice this would still work. And at least you're not directly confusing machines that try to extract semantics from the text, the way e.g. Wingdings fonts do.)

2 comments

This is generally a bad idea for accessibility reasons:

• Screen readers won’t be able to do anything with it;

• Some mobile users especially will turn off web font loading;

• Even if custom fonts are enabled, it’s not terribly uncommon for them to fail to load for any number of reasons;

• It’s best for performance if you can do something like `font-display: optional` or `font-display: fallback`; but if you are actually depending on glyphs in your font, you can’t do this. (For that font, anyway; if you use a custom font for just that range, you could still make any custom body text font optional, though at the cost of an extra request for the private use font rather than it being embedded in the rest.)

Sure, but all of those negatives are also true of the thing you’d use in place of a private-use codepoint: an inline-reflowed image.

Private-use codepoints at least have the advantage over inline images of being “opaquely” copy-and-paste-able into other documents, machine-read, etc. Any system that works in term of Unicode text will pass along the private-use codepoints in the stream, where it might strip higher-level out-of-band features like images.

As such, private-use codepoints are the analogous feature to the .notdef glyph in fonts, but for machine semantics rather than for human comprehension. In both cases, the “reader” (human/machine) gets something that it knows is there but doesn’t recognize, but knows is valid, and can opaquely be preserved and passed along, and potentially “made legible” through the lens of a different eye than theirs.

One place I would see this as being useful is in the display of unique not-yet-formalized emoji in chat systems. Copy-and-pasting such “text” out of the system would just get you opaque PUA codepoints; but if you emailed such “text” to somebody, and then they copy-and-pasted it back into the chat system, they’d see the same emoji you saw originally. It’s like a public URL representing a private document that you have to be logged into the relevant system to “access.”

—————

The real negative of the Private-Use Area codepoints, from a conservationist/archivist perspective, is that unlike HTML images that each have a distinct—if opaque—URL, the Unicode Private-Use Area is quite limited, and so prone to collisions in usage.

If the Consortium had instead come up with a stringing scheme such that any private-use glyph was actually formed from a sequence of private-use combining codepoints [sort of like the flag combiners] to form e.g. a full encoded UUID representing the PUA codepoint, then various organizations could actually generate private non-colliding codepoints without a need for registration using e.g. UUIDv4, and then be able to rely on the assumption that such codepoints will only have semantics under their private system—and any other system that wants to be explicitly compatible with their system; rather than those codepoints potentially having other, incompatible meanings in other systems that just happen to reuse them, as happens today.

Interestingly, such Private-Use UUID codepoint-sequences could then later be “adopted” into Unicode through a formal process. People who had created documents that used such meta-codepoints could register them with the Consortium, where the Consortium would 1. create “official” codepoints for those same semantics; and 2. ship a regularly-updated database file mapping meta-codepoints to later officially-registered codepoints. One pass of Unicode normalization would then involve using that database to replace private-use UUID codepoint-sequences with their registered full codepoint.

Basically, this would take the thing that happened as a series of one-off events with Unicode codepage embeddings, and turn it into a continuous ongoing fine-grained process that anyone can take advantage of.

> Sure, but all of those negatives are also true of the thing you’d use in place of a private-use codepoint: an inline-reflowed image.

Not true: your inline image should have alt text, e.g. if ⅌ didn’t exist then you’d use an image of that shape with alt="per". If the image doesn’t load, it’ll be replaced by the word “per”, and screen readers will read it as “per” or “graphic per” or similar (I believe JAWS adds that “graphic” prefix, not sure if you can convince it not to by careful ARIA attributes—or even if you can, whether you should; these things are a bit dangerous to fiddle with).

Alternatively you might use inline SVG, which gets you vector goodness, and can definitely (rather than possibly) be presented to screen readers as the word “per” perfectly.

Another fancy trick is to use ligatures to replace entire words: make your own fancy web font replace the sequence “ per ” with “ ⅌ ”.

I think most people would just inline images rather than going through the process to create a font. That's not to say what you said doesn't happen: https://en.wikipedia.org/wiki/Medieval_Unicode_Font_Initiati...