| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by edwintorok 4140 days ago

Worth taking a look at the unicode sentence segmentation algorithm rules: http://unicode.org/reports/tr29/#Sentence_Boundaries

Also at the CLDR sentence break supressions: http://unicode.org/cldr/trac/browser/tags/release-27-0-1/com...

If your rules treat an edge case that the above don't it'd probably be worth trying to suggest improvements to the unicode rules or the locale-specific ones.