| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by ryanfox 2869 days ago
	Could the bags-of-characters approach cause issues with anagrams having the same vector? It would be surprising (to me) to get the same results for e.g. "strange" and "garnets".

1 comments

misterman0 2869 days ago

Yes this model could cause issues such as the one you describe. With phrase queries/multi-token queries this becomes less of a problem. Phrases aren't anagrams that often.

A secondary index might become needed with the most popular terms, to resolve which anagram is the right one.

link