|
|
|
|
|
by wolfgarbe
568 days ago
|
|
We started with making the core search technology faster. Then we added a Unicode character folding/normalization tokenizer (diacritics, accents, umlauts, bold, italic, full-width chars...). Last week we added a tokenizer that supports Chinese word segmentation. Currently, we are working on a multi-language tokenizer, that segments Chinese, Japanese an Korean without switching the tokenizer. |
|