Hacker News new | ask | show | jobs
by mcbetz 675 days ago
Does anyone have successfully worked with Non-English text with FTS5 in Sqlite? I could not find any reference for German, e.g. and the default stemming does not seem to work properly (given some short tests).
3 comments

> Does anyone have successfully worked with Non-English text with FTS5 in Sqlite? I could not find any reference for German, e.g.

We use it in the Fossil SCM project and users have reported success with Chinese and Russian, so it presumably works fine with any European/Germanic language.

> and the default stemming does not seem to work properly (given some short tests).

The Porter Stemmer is documented as only being useful for English.

It has pretty much the same support for other languages as most text mining tools and Elasticsearch via the snowball stemmer: https://github.com/abiliojr/fts5-snowball

Should work well for German, I’m using it with Nordic languages.

Not that I tried it, but this problem seems to be related: https://stackoverflow.com/questions/45681645/how-to-enable-f...

Just that you would need to tokenize the right characters for your target language (e.g. ÜüÖöÄäßẞ¹), maybe those are already included in Unicode61.

¹: Yeah there is now a capital Eszett