Hacker News new | ask | show | jobs
by colin353 1229 days ago
Good question. We still construct ngrams for it, exactly the same way. So for example, we might extract `aaa`, `aaa`, and `aaa`. Or we may extract `aaaa` and `aaaa`, or perhaps `aaaaa`. Then we deduplicate to find the unique ngrams and look them up in the index.

So it's possible that a document containing `aaa` might match our ngram search, but we double check after retrieving them and exclude them from the result set.