|
|
|
|
|
by staplung
635 days ago
|
|
Anyone know how it handles ligatures? Depending on font and tooling the word "fish" may end up in various docs as the glyphs [fi, s, h] or [f, i, s, h]. According to a quick check against /usr/share/dict/words "fi" occurs in about 1.5% of words and "fl" occurs in about 1%. There are other ligatures that sometimes occur but those are the most common in English I believe. I don't have any sense of how common ligature usage is anymore (I notice that the word "Office" in the title of this article is not rendered with a ligature by Chrome) but it might be insanity inducing to end up on the wrong side of a failed search where ligatures were not normalized. |
|
Might be iffier in OCR mode: it seems to use Tesseract, which is known to have issues recognising ligatured text.