Hacker News new | ask | show | jobs
by jarek 4704 days ago
Somewhat related example anecdote: For several years, Vimeo was sending me newsletter emails addressed to "Dear Jarek_Piórkowski" (previously "Hi Jarek Pi??rkowski"). The ó that should be there shows up fine on the Vimeo website and I even cleared and re-input the name into my profile to give them a chance to re-encode it. Still continued.

I unsubscribed from the newsletter eventually.

And ó isn't even a difficult character, it's in ISO 8859-1 for crying out loud.

1 comments

Perfect example. That indicates that at some point, your data passed through a system using Windows-1252 encoding.

http://www.i18nqa.com/debug/utf8-debug.html

I expect Vimeo used a Linux system to collect your data, and I bet the thing that blasts emails out is ultimately Linux as well. So the Windows-1252 bungle probably happened in a third system in between, maybe a Windows system chosen for its ease of administration by the community managers.

Not that this is relevant to data sanitization (they're just being fuckups here) but it shows how complex this can get.