Hacker News new | ask | show | jobs
by krosaen 5688 days ago
I've gotten full text search working using techniques similar to those described here:

http://www.billkatz.com/2009/6/Simple-Full-Text-Search-for-A...

2 comments

This type of full-text search works on a small dataset or a dataset which doesn't have a wide variety of data. It uses the internal merge-join functionality of AppEngine, which as the name suggests takes two queries and joins them together. Problem is the the merge join has a very small time limit. I forget the exact timeout (it's undocumented), but it's something 500 milliseconds.

The problem we ran into with the merge join functionality was the following:

Let's say you're searching for "lcd monitor", your code could do a search for lcd and monitor then merge the result (select * from ngrams where ngram in ['lcd', 'monitor']). There are many lcd monitors so the merge join will find 1000 results very quickly.

Let's say you search for "dell monitor". Unlike the previous search, there aren't many dell monitors but there are lots of dell products and lots of monitors. Your merge join will timeout because there isn't enough time to perform a query for dell and another for monitor then merge the results because of the internal merge-join limitations.

Also, it was VERY expensive to index every document (our data is in a constant flux) so we decided to use a different solution.

Not sure if this will solve your problem, but have you looked at GAELucene? http://code.google.com/p/gaelucene/
I find it amusing that Google doesn't have a full-text search engine. I seem to remember them having code to do that...somewhere. Hmm.