Hacker News new | ask | show | jobs
by psnj 5653 days ago
I was surprised by my almost-panicky reaction to seeing:

  Identifiers Can Have Blanks

  open_window_with_attributes(...)
  becomes:
  open window with attributes (...)
I think I actually felt that wrongness in my stomach. Like a more intense version of seeing our corporate network shared drive's files with spaces and parens in them.

I guess I'm old.

5 comments

I had a similar reaction, and I'm not sure that it's a "damn kids, get off my lawn" reaction. Specifying an unambiguous grammar may be difficult - which implies parsing may become a problem.

An implementation exists, so the author has something working, but I'm wondering how robust the parsing is. I haven't seen many code examples (only short fragments on the page), so I don't know what potential issues, if any, there are. But, this is the sort of thing that could significantly complicate adding new language features that requires additional syntax.

edit: I'm perusing the source for the compiler, which is of course written in Zinc. This code from the main driver of the compiler perhaps gives a better feel for how it may look in practice:

  while i < argc
    def arg = argv[i]

    if is equal (arg, "-debug")
      debug = true

    elsif is equal (arg, "-v")
      version = true

    elsif is equal (arg, "-u")
      unicode = true

    elsif is equal (arg, "-o") && i < argc-1
      out filename = new string (bundle, argv[++i])
      to OS name (out filename)

    elsif is equal (arg, "-I") && i < argc-1
      append (include path, new string (bundle, argv[++i]))

    else
      filename = new string (bundle, arg)

    end

    ++i
  end
From an aesthetic point of view, it doesn't look that bad. In this example, I think "is equal", "out filename", "to OS name" and "include path" are all identifiers. But I'm still wondering what kind of parsing and lexing issues that may arise.
I already have hard time parsing this code. The main problem I see, is that to read the code I have to know every single keywords in the language.

For example I was wondering if "new" is a keyword. If it is, then "new string ()" might be something interesting, otherwise it's just a function call.

Similarly this raises a question of whether I can write the following code:

  if end of line (str)
This might or might not be permitted because "end" is a keyword. If it is permitted, then the result looks pretty damn ambiguous to me. If it's not then I have to name my identifier differently, like so:

  if end_of_line (str)
But then I'm skrewing up the style of my code...
I thought of the recognizing a keyword issue, but then I dismissed it: syntax highlighting make it a non-issue. The ambiguity with using keywords in identifiers is valid, though.
Syntax highlighting is not available everywhere, e.g. in black-and-white print.
Usually black-and-white print use bold to indicate the keywords.
Your complaint is that you can't parse the code with 100% certainty without some basic knowledge of the language? That hardly seems like a complaint at all. The same is true of any language that isn't explicitly identical to one you already know.
No, his complaint is that you can't parse the code without knowing all the keywords in the language.

He wrote every single keyword == all the keywords.

That's not basic knowledge of a language.

Maybe I'm odd, but when I start on a new language I don't learn all the syntax first, I usually I start mucking around with variable declaration, iterators, simple stuff like that just to get a feel of it. I'm guessing it's not that odd as most tutorials also follow that approach.

I don't see how what you said disagrees with what I said aside from a minor semantic quibble. You can't parse any other language with certainty without knowing all the keywords either. For example, in Ruby, you might see the identifier "continue" by itself. Is this a method call, variable access or keyword? What about "private"? No way to know if you don't know all the keywords. This is precisely cognate with nene's objection that you can't tell whether "new" is a keyword in Zinc if you don't know the language's keywords.

You can form a rough guess of what the various tokens are in a snippet of Ruby code without knowing all the keywords, and you can do the same with Zinc code. Any additional difficulty is most likely because you're less familiar with Zinc, not because it has the nearly universal property of needing to know the full grammar to correctly parse arbitrary programs.

And I agree, playing around with a language is a great way to learn. But if you play around without reading about the things you're doing first, you should expect not to always know what you're doing. That's a huge part of the learning process.

No, you can't really do the same with Zinc. In Ruby, if there's a keyword somewhere, it's pretty damn obvious it's not part of an identifier.

With Zinc, you need to know what all the keywords are just to tell what the name of something is.

And that doesn't even start on additions to the language completely breaking your code because you used that word in an identifier somewhere...

Issues only arise if the language designer wants to use spaces for something else as well (like function application).
While i didn't panic, i find myself having quite a negative reaction to a language in which "Identifiers can have blanks" is listed under main features.

EDIT : Also, i see quite the opportunity from wrong parsing, not on the machine side, but on the human side. blanks already have a function in other programming languages : They are here to separate symbols. By giving them this double meaning, you actually bring context in the parsing of any piece of code, which i think could be a pretty painful exercise.

Other version : Don't design a language version because it makes code easier to type, if it doesn't also make it easier to read

(I know the author thinks it easier to read, but i'm not yet convinced about that)

I don't see why parsing would be a problem; identifiers (and their pieces) always start with letters (so no "var 1"), alphanumerical, and cannot be a reserved word.

Meaning, parse word by word until you hit a key word or a significant character (,:". etc). You can't have "varb function(arg)" or its equivalent in any language I know, because it doesn't make sense - there's no operation on the varb, it's just "there". Similarly, "x y z = q r t" is unambiguous, because there's no stop to parsing either "x y z" or "q r t".

I think I'd like it. Hitting shift all the time, or reaching for "_" is a PITA and significantly slows my typing. It's especially annoying when you realize that identifiers with blanks could be leveraged into most languages with almost zero change to the parser, as long as it requires an end-of-statement terminator or ends on newlines.

Meaning, parse word by word until you hit a key word or a significant character (,:". etc).

If keywords are allowable in identifiers (such as "end of file"), then your algorithm is not sophisticated enough. When you encounter a token that is the same token as a keyword, you need to use context to determine if it is actually a keyword or part of an identifier.

This may be a serious problem if the grammar has "<identifier> <keyword>" in it. That is, "X keyword" could be the identifier "X keyword" or it could be the identifier "X" followed by "keyword." There's a reason that most programming languages require that identifiers are a single token.

> When you encounter a token that is the same token as a keyword, you need to use context to determine if it is actually a keyword or part of an identifier.

You're presuming here that a space delimits tokens. In this language, that may not be the case. The lexer may create a single token from "a b c".

>If keywords are allowable in identifiers

Big "if" (why shouldn't it disallow them?), and completely resolved by modifying your naming scheme in those situations: EndOfFile is unambiguous, as is end_of_file, ifSuccess, etc.

It's unusual as most programming languages allow keywords to appear in identifiers (for example, new_thing is a legal C++ identifier). Further, if I understand the language correctly, the literal "end_of_file" becomes the same identifier as "end of file". And the stated purpose of allowing white space in identifiers is to avoid camel case and underscores.
I don't think that's the case. I think the example was just to show how you can write with spaces instead of underscores. I could be wrong though, I haven't tried the language.

The documentation doesn't state one way or the other, but it does include underscores as part of identifiers, and doesn't mention any stripping. Only that spaces are ignored entirely.

That strikes me as giving lie to the "Ruby-like syntax" claim; ask a Ruby programmer what that line means and you will not get the correct answer for Zinc.

Actually the connection with Ruby is tenuous anyhow; Ruby and assembler just don't go together. An assembler should produce a very clear one-to-one correspondence of instruction to machine language opcode, pretty much by definition. A high-level language can turn a simple statement into arbitrarily-complicated run-time code, pretty much by definition. Neither of these are criticisms by any means, it's just what they are. There isn't much syntax cross-talk to be had there.

A high-level language can turn a simple statement into arbitrarily-complicated run-time code, pretty much by definition.

There are some high level languages where there is a pretty straightforward one-to-one correspondence of statement to bytecode(s).

There isn't much syntax cross-talk to be had there.

Explain the existence of Forth.

I said "run-time code", not bytecodes. I'm talking about what actually executes. I've seen "bytecodes" that qualify as high-level languages by this standard, like CPython bytecode. Is that even so surprising? Single bytecodes for OO languages can translate to a lot of work to resolve.

And, what about Forth? It's a fairly low-level language by this standard. It has convenient ways to link together a lot of little functions, but one word does not dispatch on types and expand operator overloading and do the other things that can result in one line of C++ producing half a kilobyte of code, to say nothing of the functions that half-a-kilobyte may be invoking. Nor do I see why you think that's related to the syntax point.

I really have no idea what points you or your upmodders think you've won.

Actually, bytecodes for langs like Smalltalk can get you down to controlling all of your runtime state down to the level of bits. (Squeak actually runs bit-identical on something like 50 environments!)

As for precisely what runtime instructions are executed, most of the time, we can consider this to be an implementation detail. In the case of superscalar processors, you can't necessarily tell me what order your assembly language instructions are executed.

And, what about Forth? It's a fairly low-level language by this standard.

It bridges the gap between high-level and low level. It's a clear piece of evidence that there isn't such a huge gulf as you claim.

but one word does not dispatch on types and expand operator overloading and do the other things that can result in one line of C++ producing half a kilobyte of code

There are high level languages that don't do this either. Actually, I know of a specialized declarative Smalltalk that has gotten the entire image down to 45k. A Smalltalk VM is basically little more than a 256 branch switch statement, plus message dispatch, plus GC.

The gulf isn't nearly as large as you imagine. Rather, there are a number of "high level" languages that are actually pretty minimal.

You've really missed my point. I pretty much defined high level and low level by how they expanded out from simple instructions. You can't cite examples to prove this is wrong, by definition you've classified your definitions wrong. That this is not a universal definition doesn't bother me one little bit, because there is not universal definition of any non-trivial software engineering term.
Your definition classifies some widely recognized "high level" languages as low level.
If you want to redefine a commonly used term your doing something wrong. What you need to do is define a new term. Replace "High Level Languages" with "Abstract Languages" and nobody would have a problem with what you said. It might not mean anything, but at least it's clear. However, when you, redefine an existing term and you can be wrong and people will call you on it.
Although I agree that it isn't really possible to have a "Ruby-like" syntax for a low level language since a lot of the syntax depends on Ruby being dynamic, it still seems like valid ambition, as long as you know that limitation.

I would love to have a form of C / C++ with iterators and blocks and without all the curly braces and assorted cruft like 5 different ascii symbols being used in 20 different contexts (actually, Ruby does that too, when will language designers start using a few additional symbols to improve cognitive load?).

From the article:

"I did this because I hate uppercase characters in the middle of identifiers and I'm too lazy to type shift to get the '_'. In addition, I find it more readable."

just-use-lisp-style-identifiers-then

>just-use-lisp-style-identifiers-then

hitting - is not significantly easier than hitting _ when compared to hitting the spacebar.

http://en.wikipedia.org/wiki/Fitts_law

This is yet another reason I enjoy typing in Dvorak :) (-_ is in a better place)
I honestly thought you'd posted a snippet of smalltalk or lisp or something for a second there. I think some people refer to that as a "brainfart."

(Note that I have no practical knowledge of either language. I probably wouldn't have admitted this if I did.)

Haha! I'm on Dvorak too and didn't understand at all why JohnnyCache thought hitting "-" was so much harder than Space! I guess I've been completely converted for too long.
but then you sacrifice the "-" (minus) infix operator
Unless it's surrounded by spaces, which it probably should be anyway. Compare these pathological cases for either approach:

    one-thing - another-thing
vs.

    one thing-another thing
As much as your criticisms may be valid, I think he has given sufficient justification:

"I did this because I hate uppercase characters in the middle of identifiers and I'm too lazy to type shift to get the '_'. In addition, I find it more readable."

This kind of "Because I said so" reasoning is valid in pretty much any hobbyist-type situation as far as I'm concerned. If you don't like it, fork it.