|
|
|
|
|
by NDizzle
4515 days ago
|
|
I also took a few days a few weeks ago to setup elastic search after my mysql full text search fell apart. What I'm doing is slamming the full text output of OCRed PDFs into a MyISAM table, the entire document in a text field. What I'm afraid I'm not doing right is creating the web interface to search elasticsearch. What I'm using filters with the query string syntax[1] in the search box, pointing directly at that fulltext column. I'm also using the highlight functionality so that I can specify how many highlight blurbs to return with the result. The query string syntax works great with the OCR'd text, because most of it is near-garbage (as most ocr is) so you can search for something like "net sales"~50 to find those two terms within 50 words of each other. I think the results were something like:
net sales 15,000 results
"net sales" 120 results
"net sales"~50 550 results Can anyone point me at a good web based search implementation using elasticsearch that explains how they're doing it? What I have works pretty good, I just want to... check my work, I guess. [1]: http://www.elasticsearch.org/guide/en/elasticsearch/referenc... |
|
The main thing for good stability and performance is to be very good at batching your updates. You don't want to sling a ton of highly-parallel single-document updates at Lucene, lest you thrash the JVM and start garbage collecting like crazy.
From there, on the query side, you'll want to get a good working knowledge of the different tokenization and analysis options. There are a lot of subtle and interesting combinations to be had in there that influence performance and relevance of your search results.