Hacker News new | ask | show | jobs
by undershirt 3358 days ago
> Note that string delimiters in Dern are [ and ] and not "; this way dern code can be written inside C-programs without escaping.

Interesting approach. I suppose this makes whitespace inside square brackets significant. And quotes inside of these brackets would have to be escaped if written inside C-programs.

1 comments

Not sure why you think whitespace inside square brackets would be significant? I didn't get that from OP.

I took this square-bracket approach as well in my teaching language (http://akkartik.name/post/mu). Benefits:

a) Since there's distinct characters for each end, nesting strings is convenient.

b) Since there's distinct characters for each end, the need for escaping is reduced. (I reduced the need for escaping further by having a backslash escape a whole series of backslashes rather than just the very next character. That way each iteration of escaping increases a run of backslashes by one rather than doubling it.)

c) In the context of a teaching language, I found it useful to pun square brackets for function definitions and strings. That allowed me to explain this:

  def factorial x:num -> result:num [
    ...
  ]
as a command (def) that takes some primitive arguments and a bit of text, turns it into code and adds it to the list of a computer's functions, somehow. That helps introduce students to the idea of a compiler. (Long before we delve further into it, of course.)

The drawback of punning square brackets for function definitions and strings is that in different contexts I want to permit or be oblivious to comments. The rule Mu's lexer uses pervasively is that if the very next character after the '[' is a newline, it's sensitive to comments when detecting nested square brackets to determine where the string ends. This only makes sense because Mu is statement-oriented; every statement is required to start on a new line, and there's no delimiter like a semi-colon to get around that.

Anyways, this is sort of an extended description of a very narrow set of design decisions. Just in case someone finds it useful.

I would think that whitespace becomes significant inside string-delimiting square brackets for the same reason that whitespace is significant inside string-delimiting quotes: whitespace characters are characters like any other. I would expect `[ abcd ]` to be different from `[abcd]` in the same way that `" abcd"` is different from `"abcd"` in languages that use quotes.

Is there something I'm missing that mitigates this intuition? Does this language (or, for that matter, the demo language you wrote for your class) ignore leading or trailing whitespace inside square brackets? What about excess whitespace between words (that is, whitespace beyond a single space or tab)? If so, if indeed leading/trailing/excess whitespace is collapsed inside of square bracket delimited strings, how would I create a string with leading or trailing whitespace or extra space between words if I wanted to?

Honest questions; don't mean to criticize, just eager to learn.

Oh I see. Perhaps I misunderstood what undershirt meant by "significant whitespace". Yes, in Mu [ abcd ] is different from [abcd]. I believe that to be true about Dern as well. This is all exactly as for text inside double-quotes in C.
I reduced the need for escaping further by having a backslash escape a whole series of backslashes rather than just the very next character.

How do you write the string with characters '\' and ']'? It seems like the natural way to write it, [\\\]], would end up being lexed as [\\\] (a string with two backslashes) followed by an unmatched string end character ']'.

It's just [\\]]. "\\]" is an escape for "\]".
I took this as an opportunity to finally try out mu. Very readable code!

Is there a way to write a string that ends with a backslash? The way `slurp_one_past_backslashes` is used seems to make that impossible.

EDIT: I thought typing a literal 0 byte might work, but apparently mu uses a different mechanism to delimit strings in memory.

To clarify my comment from last night: the following code[1]:

  x:address:array:char <- new [abc]
would look like this in memory, assuming the allocator returned address 1000 as the value of x:

  1000: 3
  1001: 97  # a
  1002: 98  # b
  1003: 99  # c
That's it. There's no trailing null character. Address 1004 is not part of this allocation.

The length of the array cannot be modified, only read (using instruction length). If we need a larger array we must allocate new and copy over, just like in C.

The elements of an array can be read and written using instructions index and put-index, respectively. Here's a fragment of code to make the final character of x a backslash:

  len:num <- length *x
  last:num <- subtract len, 1
  put-index *x, last, 92  # ascii code for backslash
[1] The verbose "address:array:char" (read as "address to an array of characters") is typically abbreviated as "text" in Mu programs. I've written out the full type to make things more explicit.
Ah, you're right! Trailing backslashes can't be represented. I need to rethink this.

If you want to construct it from scratch, strings and arrays in general are prefixed with their length. put-index on the final index would be the workaround.