|
|
|
|
|
by zzo38computer
433 days ago
|
|
I think the "0" prefix for octal (as used in the C programming language) is not so good and that "0o" is better; so, if octal literals are implemented then "0o" is better. For non-Unicode text, probably the simplest thing would be to treat the input as a sequence of bytes instead of Unicode characters; or equivalently to treat it as ISO-8859-1 (although programming it to use ISO-8859-1 may be less efficient then just using bytes, possibly; I don't know much about the working of Rust programming, so I don't actually know if it is or not). By "non-Unicode text", I did not mean character mapping, although character mapping is another feature that would be useful to implement, similar to what you mentioned although it could be made more efficient (like you mention). Some way to map a input sequence of bytes (whether or not it is valid UTF-8) to a output character code, would work, probably. |
|
I'm not too sure I understand what you're describing with non-Unicode text. Torque doesn't have a built-in concept of bytes, instead each character is treated as an integer with the value being the Unicode code point of that character (using decimal we have 65 for 'A', 955 for 'λ', 129302 for 'robot emoji', etc). It's up to the programmer to choose how to pack the character (integer) into a sequence of bits. Code points are different to encodings like UTF-8 or UTF-16, which define how a code point (integer) is packed down into one or more bytes.
If you want to assemble 7-bit ASCII text, one byte per character, you define a macro that packs each character value into the lower 7 bits of an 8-bit byte. If the string contains a non-ASCII character, the character value will be too large to fit into the field and an error will be displayed.
Assembling ISO-8859-1 text would be similar, but would involve remapping the characters above 0x7F like this: UTF-8 being a variable-width encoding requires a more complicated macro arrangement (you can see an example here [0]). But the key point is that strings aren't treated as byte sequences, they're just character/integer sequences until they get baked down into a byte encoding.Please let me know if that doesn't answer your suggestion, I'm keen to understand your use-case.
[0] https://benbridle.com/projects/torque/manual-v2.2.0.html#usi...