Hacker News new | ask | show | jobs
by zacharynewton 1349 days ago
One of the great parts of the Python ecosystem for data processing is https://ftfy.readthedocs.io/en/latest/ which can handle mojibake and many other unicode-related translation problems.

But seriously, I'm always a little upset when data vendors/customers/etc don't specify the encoding they are using. You'd be surprised how many official or unique sources still use weird encodings in the name of compatibility.

2 comments

If you want to test ftfy online it's available here:

https://ftfy.vercel.app/

FTFY is amazing. Really useful for processing excel generated csvs.
Funny, I have used it for the same use-case (and a sad reminder how horrific Excel's handling of UTF-8 in CSV files can be...)