| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by jhallenworld 3777 days ago

Well you are right, but these functions are not terribly fun to use. Consider a parsing function which extracts an identifier. For ASCII it's:

    if (isalpha(*s)) {
        *d++ = *s++;
        while (isalnum(*s))
          *d++ = *s++;
    }

To use UTF-8 / Unicode should require only small changes:

    if (iswalpha(decode(&s)) {
        encode(&d, advance(&s));
        while (iswalnum(decode(&s))
            encode(&d, advance(&s));
    }

For efficiency, don't decode twice- have the decoder return a pointer to the next sequence:

    if (iswalpha(c = utf8(&s, &n))) {
        encode(&d, c);
        s = n;
        while (iswalnum(c = utf8(&s, &n))) {
            encode(&d, c);
            s = n;
        }
    }

Also should be able to match a string in line:

   if ('A' == utf8(&s, &t) && 'B' == utf8(&t, &s) && 'C' == utf8(&s, &t)) // we have 'ABC'.