| Hi wolfgarbe, I don't believe my benchmark of SymSpell is misleading. I used the webassembly repository that is listed on your github: https://github.com/justinwilaby/spellchecker-wasm Here is the code I used for my benchmark: https://gist.github.com/Eratosthenes/bf8a6d1463d2dfb907fa13c... I reported the results faithfully and I believe these results reflect the performance that users would typically see running SymSpell in the browser, using the default configuration. If I had increased the edit distance, then the latency performance gap between Lexiathan and SymSpell would have been even larger, and then arguably I would have been gaming my metrics by not using SymSpell as it is configured. Regarding dictionary size: The dictionary size (as you can verify from the gist) was 82k words. I didn't specify the dictionary size I used for Lexiathan, but it was 106k words. Lastly, three of the words in the benchmark have edit distances greater than three: distance("pronnouncaition", "pronunciation") = 4 distance("maggnificntally", "magnificently") = 4 distance("annnesteasialgist", "anesthesiologist") = 6 So I do not believe SymSpell would correct these even with the edit distance increased to 3. |
That's the reason why the default maximum edit distance of SymSpell is 2.
Now, all your 6 out of 6 examples are chosen from that 1.1% margin that is not covered by edit distance 2, presenting a rather unlikely high amount of errors within a single word.
The third-party SymSpell port from Justin Willaby, which you were using for benchmarking, clearly states that you need to set both maxEditDistance and dictionaryEditDistance to a higher number if you want to correct higher edit distances. That you neither used nor mentioned. This has nothing to do with accuracy; it is a choice regarding a performance vs. maximum edit distance tradeoff one can make according to the use case at hand.
https://github.com/justinwilaby/spellchecker-wasm?tab=readme...
pronnouncaition IS within edit distance 3, according to the Damerau-Levenshtein edit distance used by SymSpell. The reason is that adjacent transpositions are counted as a single dit. https://en.wikipedia.org/wiki/Damerau%E2%80%93Levenshtein_di...