|
|
|
|
|
by benbridle
433 days ago
|
|
With character sets, I was initially going to support non-Unicode text by adding a --char-set flag to the assembler, but I decided that the character set should be defined somehow inside each program. My thought was that they could be defined as large table-like macros, something like the following: %BYTE:n #nnnn_nnnn ;
%CHAR:n
?[n 'A' ==] BYTE:0x01
?[n 'B' ==] BYTE:0x02
?[n 'C' ==] BYTE:0x03
?[n 'D' ==] BYTE:0x04 ;
CHAR:"ABCDABCD"
This is, admittedly, quite unweildy for character sets exceeding a few hundred characters, but it would work passably for small character sets like those used for HD44780-style LCD screens. What character sets did you have in mind?Octal was another feature I couldn't make up my mind about, just because I wasn't familiar with any architectures that require it. It'll be trivial to tack on though. For the Z80 instruction set, since the instruction encoding tends to cleave along octal lines, I used the following macro to pack octal digits into bytes, which has the advantage of allowing variables to be passed into each digit (the ADDr macro shows how it's used): %XYZ:x:y:z #xxyyyzzz ;
%ADDr:r XYZ:2:0:r ;
Thanks for the heads-up about the table of contents, the links should all work now. |
|
For non-Unicode text, probably the simplest thing would be to treat the input as a sequence of bytes instead of Unicode characters; or equivalently to treat it as ISO-8859-1 (although programming it to use ISO-8859-1 may be less efficient then just using bytes, possibly; I don't know much about the working of Rust programming, so I don't actually know if it is or not).
By "non-Unicode text", I did not mean character mapping, although character mapping is another feature that would be useful to implement, similar to what you mentioned although it could be made more efficient (like you mention). Some way to map a input sequence of bytes (whether or not it is valid UTF-8) to a output character code, would work, probably.