|
|
|
|
|
by bloak
843 days ago
|
|
I totally agree with the idea that balanced quotes are needed to make quoting sane. If the quotes in a string are balanced then it should be possible to quote it with no changes. I would also advocate the principle that you don't escape the escape character by doubling it. There are two problems with replacing \ with \\: firstly the length of the string doubles with each nested quotation; secondly you can't tell at a glance whether \\\\\\\\\\\\\\\\\\\n contains a newline character or an n because it depends on whether the number of backslashes is odd or even. Another useful principle is to escape a quote character with a sequence that does not contain that character: then it is much easier to check whether the quotes are balanced because you don't need to check whether any of them are escaped. So here's a possible algorithm for quoting a string: first identify the top-level quote characters that don't match (this is not totally trivial but it isn't difficult or computationally expensive); then, in parts of the string that are not inside nested quotes, but only there, replace « with \<, » with \>, and \ with \_ (say). Does that work? |
|
That leaves only the problem of escaping the escape character, and here again there is no need to constrain ourselves to ascii. There is no reason that the escape character needs to be backslash. In fact, that is a particularly poor choice because backslash, being an ascii character, is extremely precious real estate. In fact, it is doubly precious because it actually has a balanced partner in the forward slash, so if you are going to use backslash for any special purpose it should be partnered with forward slash as a balanced set (which open up the problem of what to use for the directory delimiter in your operating system, but that's another can o' worms).
I think the Right Answer is simply to choose a different character to serve as the escape character inside balanced strings. My first pick would probably be ␛, but there are obviously a lot of other possibilities.
This points to a potential danger of this approach: there are a lot of unicode characters that render very similarly, like U and ᑌ. You would need to choose the unicode characters with special meanings very judiciously, and make sure that when you are writing code you have an editor that renders them in some distinctive way so you can be sure you're typing what you think you're typing. But that seems doable.