This type of full-text search works on a small dataset or a dataset which doesn't have a wide variety of data. It uses the internal merge-join functionality of AppEngine, which as the name suggests takes two queries and joins them together. Problem is the the merge join has a very small time limit. I forget the exact timeout (it's undocumented), but it's something 500 milliseconds.
The problem we ran into with the merge join functionality was the following:
Let's say you're searching for "lcd monitor", your code could do a search for lcd and monitor then merge the result (select * from ngrams where ngram in ['lcd', 'monitor']). There are many lcd monitors so the merge join will find 1000 results very quickly.
Let's say you search for "dell monitor". Unlike the previous search, there aren't many dell monitors but there are lots of dell products and lots of monitors. Your merge join will timeout because there isn't enough time to perform a query for dell and another for monitor then merge the results because of the internal merge-join limitations.
Also, it was VERY expensive to index every document (our data is in a constant flux) so we decided to use a different solution.
The problem we ran into with the merge join functionality was the following:
Let's say you're searching for "lcd monitor", your code could do a search for lcd and monitor then merge the result (select * from ngrams where ngram in ['lcd', 'monitor']). There are many lcd monitors so the merge join will find 1000 results very quickly.
Let's say you search for "dell monitor". Unlike the previous search, there aren't many dell monitors but there are lots of dell products and lots of monitors. Your merge join will timeout because there isn't enough time to perform a query for dell and another for monitor then merge the results because of the internal merge-join limitations.
Also, it was VERY expensive to index every document (our data is in a constant flux) so we decided to use a different solution.