Hacker News new | ask | show | jobs
by jprzybyl 3555 days ago
> Shouldn't rev(1) reverse graphemes instead of code points?

I honestly don't know. Is the intended purpose of this program to reverse bytes, or reverse characters, or reverse grapheme clusters? Or extended grapheme clusters?

There's no spec - this has never been in POSIX. What is your expected behavior? Is it mine?

For what it's worth, I needed rev recently, but forgot that it existed and did this:

    perl -ne 'chomp; print scalar reverse . "\n"'
If I need it to handle UTF-8 in a certain way, I can use pragmas to change it's behavior. (I'm pretty sure that this, as it is, will ignore the surrounding locale.)
2 comments

UTF-8 can also be handled in several ways. There is a lot of middle ground between software that handles bytes, and typesetting software which is fully unicode-aware. The small unix utilities fall somewhere in there.
> UTF-8 can also be handled in several ways.

UTF8 can not be handled in several ways without breaking it, it's a pretty straightforward and strict encoding.

What would you expect the output to be when the input is:

    nôn
    nôn
Sending that through rev (with a UTF-8 locale), I get

    n̂on
    nôn
By the way, did you know the perl -l flag removes newlines on input and adds them for a print, so your command could just be:

  perl -lne 'print scalar reverse'
And, for a unicode-aware version:

  perl -CDS -lne 'print scalar reverse'