Hacker News new | ask | show | jobs
by thaumasiotes 1186 days ago
It seems very clear that Amazon's default approach is to insert hyphens based on a whitelist of correct hyphenation points.

And that is what the algorithm you refer to does! Your links [1] and [2] speak specifically in terms of the patterns being a form of data compression that is applied to lighten the storage requirements of a big list of correct hyphenation points. The hyphenation algorithm is just that you check the word you want to hyphenate against the Master List Of All Words and learn where hyphenation is allowed. The patterns are a form of data preprocessing that makes that algorithm more efficient (here, in terms of space requirements) without changing the output.

So what we need is a way to extend the set of precomputed rules whenever we want to use a word that wasn't in the original dictionary. As noted, TeX provides this with the \hyphenation{} command. Why is this not available in CSS?

Suppose I want to write an ebook that doesn't make mistakes on the level of "fis-hing" and "f-orest". [Another example I'm not making up; the Kindle app is convinced that "Ts-inghua" is correct hyphenation.] How do I include the hyphenation information in my document?