Hacker News new | ask | show | jobs
by dbfclark 4317 days ago
This is an impact of the default Unicode normalization, which is set to NFKC. This normalization is lossy for things like the ordinal indicator and trademark symbol; if you'd like to keep the ordinal indicator unchanged, use NFC normalization:

  >>> print ftfy.fix_text(u'ordinal indicator º to o in addresses.')
  ordinal indicator o to o in addresses.

  >>> print ftfy.fix_text(u'ordinal indicator º to o in addresses.',normalization='NFC')
  ordinal indicator º to o in addresses.