|
|
|
|
|
by ignaloidas
1424 days ago
|
|
Seems like it is focused so much on how fast it responds to queries that it forgot that it is extremely important on how good it's responses are. Lucene's main benefit is the huge library of token filters to improve the understanding of the text in index. Manticore only has an ability to plug your own single filter in. And while that doesn't make it any less possible to make search results good, it's definitely a lot harder to do. Producing good results is the main goal of search, speed is only necessary for it to not be too slow. A well working search will offset any extra infrastructure that it might need to run fast. |
|
It's true that Lucene has more token filters and is perhaps more flexible. It's partly because Manticore's another focus is simplicity and ease of use, so there are: * just "charset_table=non_cjk" which is a default and should work for most languages (it already does case folding and accents folding) * on top of that you can apply one or multiple morphologies: stemmers (available for many languages via libstemmer library and some stemmers are built-in), lemmatizers (available for English, German, Ukrainian and Russian languages) * "charset_table=cjk" + "morphology=icu_chinese" to segment Chinese text * you can combine that all if you wish * built-in stopwords for most languages
and of course prefix search, infix search etc.
But what we really want is that the default settings work fine in most cases.