Just that you would need to tokenize the right characters for your target language (e.g. ÜüÖöÄäßẞ¹), maybe those are already included in Unicode61.
¹: Yeah there is now a capital Eszett