Hacker News new | ask | show | jobs
by rspeer 3088 days ago
In fact, ftfy already figures that text out! Here are the recovery steps that the website outputs:

    import ftfy.bad_codecs  # enables sloppy- codecs
    s = '!¡!HONDA POW'
    s = s.encode('sloppy-windows-1252')
    s = s.decode('utf-8')
    s = s.encode('sloppy-windows-1252')
    s = s.decode('utf-8')
    s = s.encode('latin-1')
    s = s.decode('utf-8')
    print(s)
And the decoded text is (for some reason):

    !¡!HONDA POW
1 comments

Thank you, I'd also tested that but it seems to simply remove the mangled string part. Maybe it's impossible to recover it automatically after all :/
No, no. That is the recovered text.

Originally, the text had one non-ASCII character, an upside-down exclamation point. A series of unfortunate (but typical) things happened to that character, turning it into 9 characters of nonsense, the 9th of which is also an upside-down exclamation point.

It looks like ftfy is just removing the first 8 characters, but it's reversing a sequence of very specific things that happened to the text (which just happens to be equivalent to removing the first 8 characters).