|
|
|
|
|
by agentS
4501 days ago
|
|
One of the reasons that spelling corrections are so good on Google is (probably) that they are machine learning models trained on query logs(1). i.e. if you search for "hacer news", and then without clicking on any results, issue another query for "hacker news" in a very short timeframe, then it will learn that "hacker" is a good suggestion for "hacer"(2). Similarly for "Mark as Spam", Priority Inbox, Recommended Videos on Youtube, Voice Recognition on Android, etc. Note 1: Yes, you could also do a pretty good job by having a model of your problem. i.e. computing a weighted levenstein distance where the weights are the probabilities of making that error. However, I'd argue that this would still be better with centralized data; you can compute much better probability vectors. And regardless, the best solutions in the field will be with the combination of both. Note 2: All of the above is speculation. While I help write some of the tools that these guys use, I have no knowledge of how they write their software. This is just how I'd do it. |
|