| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by zzo38computer 430 days ago

I meant that the input should not have to be Unicode. The input could be any extended ASCII encoding (supporting ASCII is necessary, in order that the syntax will work; therefore some encodings such as Shift-JIS and UTF-16 are not suitable) which does not have to be mapped to Unicode. (Even if the input actually is UTF-8, you may have a character that is made up of multiple code points. If it is treated as a sequence of bytes then it can be any sequence of bytes, including a combination of multiple UTF-8 code points.)

It is another issue than mapping the character codes for output, which as you say can be unwieldly for large character sets, so another kind of table specification (which might also be useful for stuff other than text), would probably help and be more efficient than using a sequence of conditions in a macro.

About octal, someone said against using "0o"; it is usually not as difficult to distinguish if the "o" is always lowercase. Another alternative would be how it works in PostScript, which uses "8#" prefix for octal and "16#" prefix for hexadecimal. (My opinion is that "0o" is good enough though)

1 comments

benbridle 430 days ago

Oh I think I see what you mean now, you're talking about the encoding of the Torque source code that gets fed into the assembler. To be honest I'd never really considered anything other than UTF-8, the parsing is all implemented in Rust which requires strings to be valid UTF-8 anyway. Are you wanting to write Torque code with a non-Unicode text editor, or are you thinking about how the document encoding affects the handling of string literals inside the assembler?

Some kind of table syntax would be useful for character mappings, but I'm not sure what it'd look like or if it'd be applicable outside of dealing with characters. I'll think more on that.