Hacker News new | ask | show | jobs
by ynfnehf 614 days ago
First place I read about this idea (specifically newlines, not in general trusting trust) was day 42 in https://www.sigbus.info/how-i-wrote-a-self-hosting-c-compile...

"For example, my compiler interprets "\n" (a sequence of backslash and character "n") in a string literal as "\n" (a newline character in this case). If you think about this, you would find this a little bit weird, because it does not have information as to the actual ASCII character code for "\n". The information about the character code is not present in the source code but passed on from a compiler compiling the compiler. Newline characters of my compiler can be traced back to GCC which compiled mine."

1 comments

I was hoping GCC would do the same, leaving the decision about the value of '\n' to GCC's compiler, but apparently it hardcodes the numeric values for escapes[1], with options for ASCII or EBCDIC systems.

[1] https://github.com/gcc-mirror/gcc/blob/8a4a967a77cb937a2df45...

But these numeric values are also ASCII representation of numbers, rather than being the actual byte that is written to the output. Maybe there is hope still. Where do the byte values for those numbers come from when the compiler writes output?
The C standard (see C23 5.2.1p3) requires the values of '0' through '9' to be contiguous, so it doesn't matter if you only care about round-tripping. '7' - '0' == 7 no matter the character set. Though, for round-tripping I suppose this isn't strictly necessary, but it certainly makes parsing and printing decimal notation very convenient. Notably, for both ASCII and EBCDIC 'A'..'F' and 'a'..'f' are also contiguous, so parsing and printing hexadecimal can be done much the same as decimal.
Maybe the assembler?