True, although if there are lower case letters too it could become a problem. At any rate, I'm not sure that the register aspect would improve speed significantly because of the latency of getting the string from main memory. You're still going to get pipeline stalls as you reach portions of the string that are not in the cache.
Yeah, the registers will only be faster than manipulating some sort of bool array. Reading the original string (with associated cache stalls) will certainly be necessary.