|
|
|
|
|
by jhallenworld
3777 days ago
|
|
Well you are right, but these functions are not terribly fun to use. Consider a parsing function which extracts an identifier. For ASCII it's: if (isalpha(*s)) {
*d++ = *s++;
while (isalnum(*s))
*d++ = *s++;
}
To use UTF-8 / Unicode should require only small changes: if (iswalpha(decode(&s)) {
encode(&d, advance(&s));
while (iswalnum(decode(&s))
encode(&d, advance(&s));
}
For efficiency, don't decode twice- have the decoder return a pointer to the next sequence: if (iswalpha(c = utf8(&s, &n))) {
encode(&d, c);
s = n;
while (iswalnum(c = utf8(&s, &n))) {
encode(&d, c);
s = n;
}
}
Also should be able to match a string in line: if ('A' == utf8(&s, &t) && 'B' == utf8(&t, &s) && 'C' == utf8(&s, &t)) // we have 'ABC'.
|
|