| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by ssubu 2382 days ago
	[Disclaimer: work at Cliqz] Both techniques have their share of upsides and downsides, infact we also use an FST based model to perform splits i.e donaldtrump ---> donald trump. The problem with the FST approach is when the prefix has a typo. I tried the spell correct on the website you linked to, it seems to fail when the first character is incorrect. Thanks for the information though, it is always interesting to read and learn from other approaches :)

1 comments

markpapadakis 2382 days ago

Could you please tell me what you entered?

As for prefixes and typos, in our case, the FST is used for both matching a prefix and spelling correction of the prefix.

The first character is treated specially; the chance of getting the first character wrong is very low(according to our findings) so only completions/corrections that score very high get to be matched.

link

ssubu 2382 days ago

Sure. cmputer gets corrected, not omputer.

Just to provide some additional information, our library Keyvi(https://github.com/KeyviDev/Keyvi) has a very fast implementation of an FST based spell correct.

link

markpapadakis 2382 days ago

Got it. This is by design - though I suspect we need to dial down the cost for misspelled first character in prefixes.

We built our own FST, which is somewhat similar to the design of the Rust fst crate ( https://blog.burntsushi.net/transducers/#fst-construction ).

link