| The revealing shibboleth is when people call it "ANSI". (-: "ANSI" is what people call it when they are working from paltry and incomplete samizdat doco of how this stuff works, from Microsoft's old ANSI.SYS appendix to its MS-DOS user manual, to innumerable modern WWW sites all repeating received wisdom. The thing to remember is that the "E" in "ECMA" does not stand for "ANSI". * https://ecma-international.org/publications-and-standards/st... * https://ecma-international.org/publications-and-standards/st... * https://www.itu.int/rec/T-REC-T.416-199303-I If you read ECMA-35, you'll find that there's actually a whole system to escape sequences and control sequences. As I pointed out last month, it's often the case that people who haven't read ECMA-35 don't realize that parameter characters can be more than digits, don't handle intermediate characters, and don't grasp how DEC's question mark and SCO's equals sign fit into the overall picture. People who haven't read ECMA-48 and traced its history don't realize that there's subtlety to missing parameters in control sequences. And people who haven't read ITU/IEC T.416 do what many of us did years ago and get 24-bit colour wrong. Twice over. * https://github.com/tattoy-org/tattoy/issues/105#issuecomment... Other common errors include missing out on all of the other 7-bit aliases for C1 characters. Or not realising that the ECMA-35/ECMA-48 syntax allows for any control sequence to have sub-parameters, not just SGR. Or using regular expressions and pattern matching instead of a state machine. Only a state machine truly handles the fact that in the real world terminals allowed, and enacted, various C0 and C1 control characters in the middle of control sequences, as well as had ways of cancelling or restarting control sequences mid-sequence. * https://github.com/jdebp/nosh/blob/trunk/source/ECMA48Decode... But it gets even worse for a real world control sequence decoder. In the real world, not only do terminals interpret the same control sequences, and their parameters, differently depending from whether the terminal is sending or receiving them; but several terminal emulators like the one in Interix, rxvt, the one built in to Linux, and even XTerm, send control sequences that not only break ECMA-35 but also conflict with received control sequences. So if one wants to be comprehensive and be cabable of decoding real data, one needs a switch to tell the program whether to decode the character stream as if it is being received by the terminal or as if it is being sent by the terminal. * https://jdebp.uk/Softwares/nosh/guide/commands/console-decod... Microsoft Terminal tries to do things properly, which many modern terminal emulators and tools do not, and handles this with two distinct entire state machines, one for input and one for output. * https://github.com/microsoft/terminal/tree/main/src/terminal... I handled it with a few goto statements and a handful of flags. (-: * https://github.com/jdebp/nosh/blob/trunk/source/console-deco... |