Hacker News new | ask | show | jobs
by edwintorok 4093 days ago
Worth taking a look at the unicode sentence segmentation algorithm rules: http://unicode.org/reports/tr29/#Sentence_Boundaries

Also at the CLDR sentence break supressions: http://unicode.org/cldr/trac/browser/tags/release-27-0-1/com...

If your rules treat an edge case that the above don't it'd probably be worth trying to suggest improvements to the unicode rules or the locale-specific ones.