Hacker News new | ask | show | jobs
by Royaljj 1227 days ago
Hi Colin, I’m curious as to how you search repeated letters through ngram index? I understand the example search with the string “limits” (find intersection of “lim”, “imi”, “mit” and “its”). However, if the user wants to search the string “aaaaa” how would you go about searching that?
1 comments

Good question. We still construct ngrams for it, exactly the same way. So for example, we might extract `aaa`, `aaa`, and `aaa`. Or we may extract `aaaa` and `aaaa`, or perhaps `aaaaa`. Then we deduplicate to find the unique ngrams and look them up in the index.

So it's possible that a document containing `aaa` might match our ngram search, but we double check after retrieving them and exclude them from the result set.