Hacker News new | ask | show | jobs
by flareback 619 days ago
He gave 4 examples of how it's done incorrectly, but zero actual examples of doing it correctly.
3 comments

> Okay, so those are the problems. What’s the solution?

> If you need to perform a case mapping on a string, you can use LCMap­String­Ex with LCMAP_LOWERCASE or LCMAP_UPPERCASE, possibly with other flags like LCMAP_LINGUISTIC_CASING. If you use the International Components for Unicode (ICU) library, you can use u_strToUpper and u_strToLower.

The correct thing to do is to not do it at all. If text is 3rd-party supplied, treat it like an opaque byte sequence. Alternatively, pay a well-trained human to do it by hand.

All other options are going to result in edge cases where you're not handling it properly. It's like trying to programmatically split a full name into a first name and a last name: language doesn't work like that.

    for (int i = 0; i < strlen(s); i++) {
        s[i] ^= 0x20;
    }
Thank you for this universal approach. I can now toggle capitalization on/off for any character, instead of just being limited to alphabetic ones!

Jokes aside, I was kinda hoping for a good answer that doesn't rely on a Windows API or an external library, but I'm not sure there is one. It's a rather complex problem when you account for more than just ASCII and the English language.

Next up, check out our vector addition implementation of Hello+World. Spoiler alert, the result is Zalgo
Surely you meant:

  s[i] &= ~0x20;
We're talking about converting to upper case after all! As an added benefit, every space character (0x20) is now a NUL byte!
Free strtok!