Hacker News new | ask | show | jobs
by ronjakoi 562 days ago
I hope the folding and normalization is configurable by language. I really hate it when some search decides that a and ä are the same letter. In Finnish they really aren't; "saari" is an island, "sääri" is the lower leg or shin.
1 comments

Currently, you can choose between tokenizers with or without folding. But configurability per language or full customizability of the folding logic by the user is a good idea.