This page doesn't do a good job of explaining what is interesting or unique about the language or why I might want to learn more about it. I'd suggest revising the 'About' section to try and focus more on answering those questions. Although given it's name means 'obscure language' maybe that's deliberate.
Yeah I think this was more of a fun-time hobby project than a serious candidate for the NBL. Quoting the note:
"This is the first language I have ever designed or implemented. There are also a lot of features not yet implemented and probably bugs not yet fixed. The interpreter is currently also just a tree walker, and not a faster bytecode vm."
> Note that string delimiters in Dern are [ and ] and not "; this way dern code can be written inside C-programs without escaping.
Interesting approach. I suppose this makes whitespace inside square brackets significant. And quotes inside of these brackets would have to be escaped if written inside C-programs.
a) Since there's distinct characters for each end, nesting strings is convenient.
b) Since there's distinct characters for each end, the need for escaping is reduced. (I reduced the need for escaping further by having a backslash escape a whole series of backslashes rather than just the very next character. That way each iteration of escaping increases a run of backslashes by one rather than doubling it.)
c) In the context of a teaching language, I found it useful to pun square brackets for function definitions and strings. That allowed me to explain this:
def factorial x:num -> result:num [
...
]
as a command (def) that takes some primitive arguments and a bit of text, turns it into code and adds it to the list of a computer's functions, somehow. That helps introduce students to the idea of a compiler. (Long before we delve further into it, of course.)
The drawback of punning square brackets for function definitions and strings is that in different contexts I want to permit or be oblivious to comments. The rule Mu's lexer uses pervasively is that if the very next character after the '[' is a newline, it's sensitive to comments when detecting nested square brackets to determine where the string ends. This only makes sense because Mu is statement-oriented; every statement is required to start on a new line, and there's no delimiter like a semi-colon to get around that.
Anyways, this is sort of an extended description of a very narrow set of design decisions. Just in case someone finds it useful.
I would think that whitespace becomes significant inside string-delimiting square brackets for the same reason that whitespace is significant inside string-delimiting quotes: whitespace characters are characters like any other. I would expect `[ abcd ]` to be different from `[abcd]` in the same way that `" abcd"` is different from `"abcd"` in languages that use quotes.
Is there something I'm missing that mitigates this intuition? Does this language (or, for that matter, the demo language you wrote for your class) ignore leading or trailing whitespace inside square brackets? What about excess whitespace between words (that is, whitespace beyond a single space or tab)? If so, if indeed leading/trailing/excess whitespace is collapsed inside of square bracket delimited strings, how would I create a string with leading or trailing whitespace or extra space between words if I wanted to?
Honest questions; don't mean to criticize, just eager to learn.
Oh I see. Perhaps I misunderstood what undershirt meant by "significant whitespace". Yes, in Mu [ abcd ] is different from [abcd]. I believe that to be true about Dern as well. This is all exactly as for text inside double-quotes in C.
I reduced the need for escaping further by having a backslash escape a whole series of backslashes rather than just the very next character.
How do you write the string with characters '\' and ']'? It seems like the natural way to write it, [\\\]], would end up being lexed as [\\\] (a string with two backslashes) followed by an unmatched string end character ']'.
To clarify my comment from last night: the following code[1]:
x:address:array:char <- new [abc]
would look like this in memory, assuming the allocator returned address 1000 as the value of x:
1000: 3
1001: 97 # a
1002: 98 # b
1003: 99 # c
That's it. There's no trailing null character. Address 1004 is not part of this allocation.
The length of the array cannot be modified, only read (using instruction length). If we need a larger array we must allocate new and copy over, just like in C.
The elements of an array can be read and written using instructions index and put-index, respectively. Here's a fragment of code to make the final character of x a backslash:
len:num <- length *x
last:num <- subtract len, 1
put-index *x, last, 92 # ascii code for backslash
[1] The verbose "address:array:char" (read as "address to an array of characters") is typically abbreviated as "text" in Mu programs. I've written out the full type to make things more explicit.
Ah, you're right! Trailing backslashes can't be represented. I need to rethink this.
If you want to construct it from scratch, strings and arrays in general are prefixed with their length. put-index on the final index would be the workaround.
> Every variable and function definition in Dern must be documented by a documentation string. Dern also makes sure that every formal function parameter is documented in the documentation of function definition.
Interesting. Should this be available as a compiler warning in other languages?
Definitely not. I can't imagine exploratory programming in a language that dictates overhead right from the beginning, and even later on some stuff is just too obvious to document.
Compilers should stick to their job: turning human readable code into efficient machine executable code with a minimum of fuss, a maximum of speed and errors where the human readable code leads to either undefined behavior or is simply incorrect.
Enforcing coding styles and documentation requirements should be left to plug-ins for your favorite CI set-up or, alternatively, to stand alone executables or scripts.
The only thing forced documentation will lead to is lots of boilerplate or blank stuff to satisfy the compiler, it will not lead to better documentation.
This is roughly the state of affairs with lots of autodoc docs, it's rare to see good documentation that has been automatically produced but it is very common to see page after page of extremely poor automatically generated documentation. Which then of course gets no love at all because after all, the documentation is already done.
It seems to be a dynamically typed language. So they require developers to put type annotations into comments so that the compiler cannot make use of them. Oh well.