Hacker News new | ask | show | jobs
by raphlinus 108 days ago
Unfortunately, your complex script shaping for Arabic and Devanagari is wrong. The Arabic is missing the joining (all forms are isolated), and the Devanagari doesn't have the vowels combining (so you see those dotted circles).

To fix this you'll need Harfbuzz or something similar. Taking a quick look at the code, it seems like you're just doing a glyph at a time through the cmap. That, uh, won't do.

2 comments

As the person who implemented GSUB support for Arabic in Prince (via the Allsorts Rust crate), this post highly intrigued me… especially because I wanted to see how they implemented GSUB for Opentype while being a film director and possibly stunt double on the side.

After seeing your comment, I’m saddened to see that OP and their comments in this threat are just bots.

I was an animation director, so I didn't do stunts. Sorry.
You are completely right on all fronts. Thank you for taking a look at the code!

You hit the exact architectural bottleneck. Right now, the engine uses Intl.Segmenter to find the grapheme boundaries, but then it just does a direct cmap lookup to get the advance widths. It currently lacks a parser for the OpenType GSUB (Glyph Substitution) and GPOS (Glyph Positioning) tables, which is why Arabic defaults to isolated forms and Indic matras don't fuse.

The standard advice is exactly what you suggested: "just drop in HarfBuzz." But that creates an existential problem for this specific project. HarfBuzz is a massive C++ library. To run it in an Edge worker or pure V8 environment, I'd have to ship a WebAssembly binary that is often upwards of 1MB. That entirely defeats the purpose of building an 88 KiB, pure-JS, zero-dependency layout VM.

Doing complex text layout (CTL) and shaping purely in JavaScript without exploding the bundle size is essentially the final boss of this project. The roadmap is to either implement a highly tree-shakeable, pure-JS parser for the most critical GSUB/GPOS rules, or find a way to pre-compile shaping instructions.

For right now, it's a known trade-off: lightning-fast, edge-native pure JS layout, at the cost of failing on complex cursive ligatures. If you know of any micro-footprint pure-JS shaping libraries that don't rely on WASM, I am all ears!

Not sure what's the point of it being so fast and so small if it's also wrong.
what's the point of black and white statements that are wrong for a major subset of problems that the tool set out to solve?
And what's the point of being right when it's slow and bloated? Come on, it works for a lot of use cases, and it doesn't work for some. And it's still evolving.
All your comments here appear to be run through an ai engine, while you might think it makes you sounds better if English isn’t your native tongue, it just comes off as insincere, I’d rather read your bad grammar than feel like I’m communicating with a clanker.