|
|
|
|
|
by eatonphil
1066 days ago
|
|
Ok, one reason I can think of why you'd want to be UTF-8 aware is so that your error messages at any part of the parser could point to the exact column in the line of text. The line number you could get without being UTF-8 aware. But the column number you couldn't get without being UTF-8 aware. |
|
Can you? Unicode has the following "new line" characters:
* U+000A Line Feed (LF) alone
* U+000D Carriage Return (CR) alone
* CRLF as one indivisible sequence
* U+000B Line Tabulation (VT) — supporting this is explicitly optional, and the main standard's newline function definition does not include it
* U+000C Form Feed (FF)
* U+0085 Next Line (NEL), an EBCDIC round-trip compatibility character
* U+2028 Line Separator (LS)
* U+2029 Paragraph Separator (PS)
My source: https://langdev.stackexchange.com/a/590/717