Hacker News new | ask | show | jobs
by dunham 2782 days ago
If you have a Unicode library available, you might try asking it to convert the text to NFKD or NFKC normalization form. This will take apart ligatures (the former will also take apart accented characters).