| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by snikolaev 1424 days ago

> Manticore only has an ability to plug your own single filter in

It's true that Lucene has more token filters and is perhaps more flexible. It's partly because Manticore's another focus is simplicity and ease of use, so there are: * just "charset_table=non_cjk" which is a default and should work for most languages (it already does case folding and accents folding) * on top of that you can apply one or multiple morphologies: stemmers (available for many languages via libstemmer library and some stemmers are built-in), lemmatizers (available for English, German, Ukrainian and Russian languages) * "charset_table=cjk" + "morphology=icu_chinese" to segment Chinese text * you can combine that all if you wish * built-in stopwords for most languages

and of course prefix search, infix search etc.

But what we really want is that the default settings work fine in most cases.