Hacker News new | ask | show | jobs
by ryanfox 2869 days ago
Could the bags-of-characters approach cause issues with anagrams having the same vector?

It would be surprising (to me) to get the same results for e.g. "strange" and "garnets".

1 comments

Yes this model could cause issues such as the one you describe. With phrase queries/multi-token queries this becomes less of a problem. Phrases aren't anagrams that often.

A secondary index might become needed with the most popular terms, to resolve which anagram is the right one.