Hacker News new | ask | show | jobs
by lindig 3325 days ago
TeX implements a very good spelling engine that that is driven by patterns [1]. I would expect it very difficult to improve on this and as far as I can see, the article doesn't include a comparison.

[1]: https://tex.stackexchange.com/questions/262588/how-are-hyphe...

1 comments

He talks about exactly about pattern matching and mentions latex using it in the second section. Also that this approach doesn't work as well with German compound words which is the whole premise.
Giving one example is not an evaluation that would convince me that NN are better. The German LaTeX community is one of the largest and I haven't heard much about it being unhappy with TeX's hyphenation.
That word would be Nahrungsmittelunverträglichkeit again. I just tested it and LaTeX (with `\usepackage[ngerman]{babel}`) does the same mistake as pyphen in the article (it hyphenates the word as Nah-rung-smit-telun-ver-träg-lichkeit).

To be fair, in day-to-day use problems like these will be corner cases as to my knowledge LaTeX tries to avoid hyphenation and even if it has to split a word, it has a good chance of getting it right. Also, to me this project's focus was more on learning about neural networks than creating a better hyphenator.

I think that it's probably "good enough", I don't think it's a given that this NN is worse, and it's possible that it could feasibly be translated to machine code which runs faster than TeX's handbuilt algorithm, and also possible that it produces better results in most cases.