| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by susam 598 days ago

I don't know about usefulness but it does let us write identifiers using Unicode characters. For example:

  public class Foo {
      public static void main(String[] args) {
          double \u03c0 = 3.14159265;
          System.out.println("\u03c0 = " + \u03c0);
      }
  }

Output:

  $ javac Foo.java && java Foo
  π = 3.14159265

Of course, nowadays we can simply write this with any decent editor:

  public class Foo {
      public static void main(String[] args) {
          double π = 3.14159265;
          System.out.println("π = " + π);
      }
  }

Support for Unicode escape sequences is a result of how the Java Language Specification (JLS) defines InputCharacter. Quoting from Section 3.4 of JLS <https://docs.oracle.com/javase/specs/jls/se23/jls23.pdf>:

  InputCharacter:
    UnicodeInputCharacter but not CR or LF

UnicodeInputCharacter is defined as the following in section 3.3:

  UnicodeInputCharacter:
    UnicodeEscape
    RawInputCharacter

  UnicodeEscape:
    \ UnicodeMarker HexDigit HexDigit HexDigit HexDigit

  UnicodeMarker:
    u {u}

  HexDigit:
    (one of)
    0 1 2 3 4 5 6 7 8 9 a b c d e f A B C D E F

  RawInputCharacter:
    any Unicode character

As a result the lexical analyser honours Unicode escape sequences absolutely anywhere in the program text. For example, this is a valid Java program:

  public class Bar {
      public static void \u006d\u0061\u0069\u006e(String[] args) {
          System.out.println("hello, world");
      }
  }

Here is the output:

  $ javac Bar.java && java Bar
  hello, world

However, this is an incorrect Java program:

  public class Baz {
      // This comment contains \u6d.
      public static void main(String[] args) {
          System.out.println("hello, world");
      }
  }

Here is the error:

  $ javac Baz.java
  Baz.java:2: error: illegal unicode escape
      // This comment contains \u6d.
                                   ^
  1 error

Yes, this is an error even if the illegal Unicode escape sequence occurs in a comment!

1 comments

ivanjermakov 597 days ago

I wonder if full unicode range was accepted because some companies are writing code in non-english.

link