Hacker News new | ask | show | jobs
by dododo 5298 days ago
here's another "fun" grep locale oddity:

   $ echo HI | LANG=en_US.utf8 grep '^[a-z]'
   HI
   $ echo HI | LANG=C grep '^[a-z]'
   $
apparently en_{GB,US}.utf8 orders a-z like aAbBcC..zZ.

   $ echo ZI | LANG=en_US.utf8 grep '^[a-z]'
   $
2 comments

This is what I get:

    $ echo HI | LANG=C grep '^[a-z]'
    $ echo HI | LANG=en_US.utf8 grep '^[a-z]'
    $ 
How come?
I was able to reproduce the bug. It could be a version thing.

  ; grep --version
  GNU grep 2.6.3
  ; echo A | LANG=en_US.utf8 grep '[a-z]'
  A
No, I have the same version but not a similar result. I also have the en_US.utf8 locale installed.
I had the same problem with sort:

  $ sort <<EOF
  > Aa
  > aa
  > Ab
  > ab
  > EOF
  aa
  Aa
  ab
  Ab
I was going crazy because I was getting different results in OSX and Ubuntu. Setting the LANG to POSIX fixed it.

    $ sort --version
    sort (GNU coreutils) 8.14
For what it's worth, this gets me the same results under LANG=C and LANG=en_US.utf8